A Developer’s Thoughts on Big Data

Photo: iStockphoto

Pavlo Baron works for Codecentric AG, a specialist in software development and innovative technologies. Ukraine-born Pavlo heads up the company’s architecture competence center, where he is chiefly concerned with technology management in companies and with the various approaches to using Big Data. He is also an author and is currently working on a German publication entitled: Big Data für IT-Entscheider: Riesige Datenmengen und moderne Technologien gewinnbringend nutzen (“Big Data for IT Decision-Makers: How to Profit from Large Data Quantities and Modern Technologies”). Pavlo studied systematic development in Ukraine.

SAP.info: How did you become interested in the topic of Big Data?

Pavlo Baron: A few years ago, I started looking at statistics and analyses in detail. I then ran into the topic of machine learning and went on to find out about the world of distributed systems. Based on this combination of different topics, I decided – independently of the emerging “Big Data” hype – to investigate what is involved in processing large volumes of data.

You refer to a particular type of developer, the “Big Data developer”. Could you expand on that distinction?  

Not everything that is labeled “Big Data” is in fact Big Data. In my view, the term Big Data has been hyped up by marketing because it’s a new way to make money. My main message is that, as a Big Data developer, you need to delve deep into the scientific aspects of the job, such as analyzing, processing, and storing large volumes of data. That’s the only way to use tools effectively. Developers need to know about analytical processes, statistics, and machine learning. And they need to know how to use specific data to program algorithms. The main focus is the analytical side, but you also need the scientific background and an in-depth technical knowledge of the tools you work with in order to gain control of vast volumes of data. There’s no one tool that offers this per se.

What skills must developers and Big Data developers possess?

Developers must understand the machines they work with. For example, they must know how a hard disk works.  Many of them rely far too much on the tool. This is particularly important when working with Big Data because you need to be able to get the best performance out of the machines that are available. This is basic knowledge that a developer simply must possess.

What is your career advice for young developers?

First of all: learn! “Rookie” developers should examine the scientific and mathematical aspects that are fundamental to analysis very closely. When they work with a tool, they should look “behind the scenes”. And they should be open to using tools from other manufacturers, not just from manufacturer XY.

Next page: What does “DevOps” have to do with Big Data?

What problems do companies commonly experience with Big Data?

The central problem is that they don’t know how to collect data volumes that are large enough to produce useful analyses. They are used to organizing data in a structured way, which is simply not possible in the chaotic world of Big Data. Employees must go back to having a fundamental understanding of the machine in order to gain the maximum benefit from Big Data. There’s no point using a tool if you don’t understand it. Of course, companies can call in experts to set up their tools, but what’s the use of that if even the slightest error leaves them stumped?

What can CIOs do to make effective use of Big Data?

Companies must invest in their own employees. One way is to send them on workshops or training programs. Businesses prefer to put their money in the big-name companies. That way, they have a contract with a manufacturer and can apportion blame when things go wrong – but it’s not a profitable solution in the long-term. If companies want to process large volume of data today, they must invest in their own workforce. One solution is to follow the “DevOps movement” (from “Dev” for software developer and “Ops” for IT operations). In this method, employees come together in inter-disciplinary teams that are responsible for developing the idea and for executing it. The distinction between development and practice disappears. Everyone is together in one boat. They develop a solution together and, ultimately, they operate and maintain it together. Conventional enterprises do not work in this way. It’s the startups who are driving this idea. Examples are Facebook and Twitter. In Germany, the DevOps movement is already taking place in Berlin.

In which areas is Big Data actually helpful?

Now that people are becoming more and more reliant on computers, I think that we will see an increase in the use of “recommendation systems”. These enable machines to perform analyses in advance and give us certain proposals about what we should do. This makes our lives easier. The fewer decisions that the individual is called on to make, the more qualified will be the decisions that he or she actually does make. One example of this kind of system is Apple’s Siri – a personal assistant that answers the user’s questions and carries out his or her orders.

Next page: Introducing the data scientist

How do you envisage the future of Big Data?

I have a very high regard for the concept of “Big Data”, but not for the hype that has been attached to it. Internationally, there is a burgeoning range of occupational training opportunities for aspiring IT experts. One of these occupations is the “data scientist”. That’s someone with a strong mathematical background and very good programming skills, who therefore has the ability to derive the maximum amount of information from vast quantities of data. This type of information extraction can help a company determine its strategies end to end.

Conventional businesses need to set up research departments and be proactive in driving information extraction. They should keep an open mind and not just buy products from supplier X. Developers should be given the chance to start from scratch – in the same way that start-ups do – and to develop their own processes and tools. That, incidentally, is also a way for companies to attract high-potential employees.

What developments do you think we’ll see in the coming years?

Companies have now recognized that information extraction is vital in the fight to remain competitive. They have to make decisions fast so that they can adjust to changes in the market situation. Time is a decisive factor. Thanks to Big Data, information extraction aimed at improving a company’s market position will reach new dimensions in the coming years. Whether they operate in the retail, manufacturing or financial sectors, businesses will all analyze data proactively from various sources and turn the insight they gain into competitive advantage. Overall, people will allow themselves to be helped by machines to a much greater extent.