Finding Promise in Gene Research

Karsten Borgwardt. (Photo: MPG)

Karsten Borgwardt, 32, discovered his passion for science at a tender age. It all started with a trip to the Ontario Science Centre in Toronto, Canada, as a young boy, he recalls. He was always interested in math questions, and could already program a bit as a teenager. “But not huge software packages…” he adds humbly. Borgwardt doesn’t like to draw attention to the fact that his career to this point has been anything but “normal” for an up-and-coming scientist. But he is certainly proud of the countless prizes he has received for his work in the last couple of years. Most recently, the Krupp Foundation presented him with the prestigious Alfried Krupp Award for Young Professors, one of the most generously endowed research grants for young scientists in Europe. According to the foundation, Borgwardt was a unanimous choice, beating out 52 other candidates as the 2013 recipient.

A professor of “Data Mining in Bioinformatics” at the University of Tübingen for the past two years, Borgwardt studied computer science at the Ludwig Maximilian University in Munich, minoring in biology from his first semester. During his studies, the scholar of the Stiftung Maximilianeum (a foundation for gifted students) participated in an exchange program at the University of Oxford, England, completing a master course in biology. “All of the other applicants were biologists, but I still made the top 15 and got a spot on the program,” Borgwardt says. He describes his time at Oxford as one of the most pivotal moments in his career so far. His bioinformatics project on gene finding during those four months exposed him for the first time to the concept of machine learning. “The synergy effects between the two disciplines biology and informatics became much clearer to me later on, while I was working on my master’s thesis,” he says. That was back in 2004. At that time, Borgwardt was in the middle of a research fellowship at the NICTA Statistical Machine Learning Group in Canberra, Australia, which he had applied for after seeing job postings from a scientist whose literature he had been reading.

Modern technology significantly reduces computing time

Combining data analysis with biological research is not a new approach – researchers worldwide were already pursuing it in Borgwardt’s student days. One of the goals back then, he recalls, was to analyze the genome sequence for gene locations. “New high-throughput methods have really accelerated research in this field in the last few years,” he says. Now, instead of searching for a connection between a specific disease and the gene expression of one gene among thousands, scientists often try to pinpoint single locations in the genome – out of many million possible locations – whose mutation could be the cause of specific diseases. Obviously, the volume of data is much larger today than it was in the past. But as Borgwardt points out, today’s data mining technology “significantly reduces computing time.”

Analyzing data pools of 100,000 people

Joint research by computer scientists and geneticists

Borgwardt also leads a research group at the Max-Planck-Institute for Intelligent Systems and the Max-Planck-Institute for Developmental Biology in Tübingen, where he analyzes graphs of amino acids (the building blocks of protein). His fascination with protein databases began during his research fellowship in Canberra. And although researchers today have access to comprehensive 3-D databases to research proteins, Borgwardt admits that “we still cannot say for each protein exactly what processes it goes through or triggers.” Working under the assumption that proteins with similar structures have similar functions, he scans databases to detect such proteins, and then depicts and compares their structures in a great many graphs. In 2007, his efforts won him the Heinz Schwärtzel dissertation prize, from the three universities in Munich, for the best PhD thesis in computer science.

Genetic expression data is a further element in Borgwardt’s research. This data reflects the state of activity of every gene in a human body. Researchers measure this activity to identify genes that are either more active or less active in sick people – such as in cancer patients – than in healthy people. Borgwardt is trying to detect characteristic patterns in the genetic makeup of sick people as well, by searching for replaced genetic bases.

Data mining in 100,000 data records

But all of this research depends on having comprehensive data records. According to Borgwardt, you need to analyze at least 100,000 loci in the human genome when searching for replaced bases – and not just in one or two people, but in data pools of around 10,000 individuals, both ill and healthy. This means that researchers quite often search the data records of at least 100,000 people worldwide. The larger the number of individual genetic profiles included in the assessment, the more accurately one can identify certain predispositions for diseases.

Next page: New algorithms have potential for widespread application

Borgwardt’s work is ultimately aimed at enabling personalized medicine and therapies. He wants to be able to answer questions like: Why does a certain medicine work for one group of people but not for another? But he also wants his efforts to advance the field of computer science. “My research group is developing new algorithms for data analysis. They are the first concrete results of our work,” he explains. Applying the algorithms to questions affecting biology or medicine is the second step. International consortia, for example, are now benefitting from Borgwardt’s five years of research, using his algorithms to explore therapies for chronic lung disease or migraine headaches. In principle, though, the algorithms could help clarify issues in almost any area of science or field of application.

Although reluctant to hazard a guess as to when large-scale personalized medicine will become reality, Borgwardt points out that “to a certain degree it already exists, for example when doctors perform gene tests for hereditary diseases.” Yet many open questions remain about how diseases come about in the first place. Borgwardt and his research network, comprising 14 labs in eight countries, are working hard to find answers. Funded by the European Union to the tune of €3.75 million, the network unites IT with biology, and data science with life sciences: Around half of the participating scientists come from the IT world, while the other half come from disciplines such as genetics and medicine.

Data records increasingly complex

One of Borgwardt’s goals over the next five years will be to analyze the ever growing volume of data records and filter out those elements that provide new scientific insights. “The fact that we have more data at our disposal doesn’t necessarily mean that we automatically gain more knowledge from it,” he says. So he will continue developing new data mining algorithms for even bigger databases. “We’re going to be putting a lot of energy into this,” he adds.

To recharge his batteries every now and again, Borgwardt spends time traveling with his wife, as both enjoy visiting foreign countries and regions. But he also makes frequent business trips, to meetings, seminars, or collaboration partners. “As a PhD student I was really surprised at first by how much you travel in this job,” he says. South America, Australia, the USA, Asia, and of course Europe: He’s been to them all. He once even traveled to all five continents in one year alone to present his work. But he’s still got “a long list of places” he hasn’t been to yet. Such as South Korea, where colleagues have invited him to hold a seminar – date pending. His number one private travel destination? The Catalonian capital, Barcelona.