ProteomicsDB powered by SAP HANA is helping researchers around the world understand the human body better and paving the way for the development of new treatments for cancer, dementia, and diabetes.
Dr. Hans-Christian Ehrlich is a bioinformatician at the SAP Innovation Center in Potsdam, Germany. He specializes in the study of molecular processes and in developing software to analyze mass biological data.
“I’m fascinated by the interplay between science and business,” says Ehrlich, whose responsibilities at the SAP Innovation Center include heading up the “ProteomicsDB” project.
A platform powered by SAP HANA, ProteomicsDB was first presented in May 2013 by its creators, SAP and the Technical University of Munich (TUM), one of Europe’s leading research universities. Since then, it has received a great deal of attention from the scientific community. Its aim is to aid the efforts of decoding the human proteome, in other words, the human body’s entire set of proteins. Decoding the proteome is a vital element in helping researchers gain a better understanding of disease and enabling them to develop new treatments and enhanced diagnosis methods.
“The proteome contains a wealth of information about the human body,” explains Ehrlich. For example, the protein content of a cancerous cell is often different from that of a healthy cell. Researchers are now trying to identify proteins that indicate cancer, dementia, diabetes and other diseases, and to establish correlations between these protein patterns and the diseases themselves.
“ProteomicsDB is a significant milestone on the journey to improving our understanding of diseases and deriving potential treatments. The software enables scientists and other stakeholders to store, link, and analyze experimental data in real time, empowering them to investigate biological systems much more closely than was ever possible before,” says Professor Bernhard Küster, head of the Chair of Proteomics and Bioanalytics at TUM.
Article in Nature Magazine
The basic technique used to make the proteins in human cells visible is known as “mass spectrometry”. This is where SAP’s contribution becomes vital: a single mass-spectrometry experiment pumps out about 2,4 gigabytes of data. Even when fully analyzed, the data from each experiment still leaves a total footprint of several terabytes. Thus, SAP’s part in the cooperation project with TUM involves leveraging the SAP HANA in-memory database to make this data accessible to anyone who needs it – fast.
“While TUM is focusing on the scientific element – proteomics – we’re actively driving the technical side. Our colleagues in Munich tell us which analysis strategies they want ProteomicsDB to offer and we look at how to implement them on SAP HANA,” says Ehrlich.
In May 2014, TUM and SAP published their joint findings in Nature magazine, one of the world’s most important and influential scientific journals. Basically, if you get an article published in Nature, then you know you’re on the radar. And indeed, ProteomicsDB has since been referenced in numerous scientific publications and is now being used on a daily basis. Proteomics researchers all over the world value the platform as a place to get the “big picture” of the human proteome.
Vision: The “Multi-omics Platform”
“The human body contains about 20,000 genes, each of which gives rise to one or more proteins,” explains Ehrlich. “So far, scientists have mapped more than 18,000 proteins, or 93 percent of the entire proteome. But the human proteome is incredibly complex: Proteins can exist in many different forms and are often modified within the cell itself, which means that the human proteome is, in fact, potentially made up of millions of proteins.”
Consequently, there is a great deal still to do in this area. For example, TUM wants to work with the SAP Innovation Center to map and analyze mass spectrums for every single protein individually.
Yet Ehrlich’s vision and that of his colleagues at the SAP Innovation Center, TUM, and SAP already goes much further.
“We can envisage developing a platform that simultaneously processes genome, transcriptome, proteome, and metabolome data – and that integrates other clinical data as well,” he says. This multi-omic approach is being investigated by TUM and by scientists involved in the “Forschungscampus Modal” research initiative at the Freie Universität Berlin. “There’s enormous potential in a multi-dimensional approach such as this,” says Ehrlich.
Indeed, inquiries are already coming in from the industrial sector from companies interested in operating a database like this one with their own data behind the corporate firewall.
“This project could rapidly turn into a product,” says Ehrlich, adding that projects like ProteomicsDB are helping SAP learn how to process and visualize highly specialized data with SAP HANA. “Ultimately,” he concludes, “We hope that our efforts will help medical research gain a better understanding of disease and give partners the technology they need to achieve advances in personalized medicine.”
SAP built the free online ProteomicsDB platform in collaboration with the Technische Universität München (TUM). Interested parties can access ProteomicsDB to find key information about the human proteome – gathered from analyses of cells, bodily fluids, and cancer cell lines – in real time. Scientists can start search queries on the database and display the results in different ways, or use a direct interface to access three programming languages for more flexible queries and to download the results. SAP provides both the infrastructure – consisting of 160 CPUs and two terabytes RAM – and the SAP HANA platform.
The SAP Innovation Center Potsdam and various SAP departments are involved in the project.
This piece is part of an ongoing “SAP Healthcare: A Check Up” series, which looks at how SAP is improving people’s health around the world.