What ways of accessing information on the Internet are still underdeveloped, and therefore cannot be used effectively?
Kraft: Even though the technologies in the area of search engines have been constantly developed in the past, the process of finding information on the Internet is still in its infancy. The first search engines did not make use of the structure of HTML documents, and mainly relied on indexing the pure text content. In addition, the hyperlink structure of the Web was completely ignored, and this led to ridiculous search results such as “10,326,839 documents matching your query”. Things have changed over time; search engines and ranking algorithms have been refined, special HTML-related features such as text in titles, metatags, text in markup – are stored separately in the index. Popular search engines such as Google show how the hyperlink structure of the web can be used successfully to drastically improve the quality of the search results.
What are the challenges at the moment?
Kraft: The main problem in the future will be that the volume of information on the Internet will grow continuously and quickly. The exponential growth of this information is already one of the main challenges in the design and construction of search engines. In addition, the indexed data is not structured, and that makes it difficult to locate information directly. And finally, I think there will be huge problems with spam. By this I mean users who try to trick the ranking of search engines in order to get their products as far up the list of search results as possible. Together, all these individual factors result in the need to improve the quality of search results. As far as these tasks go, the limit has now been reached with today’s conventional methods. We need to devise and develop new methods from the areas of “machine learning” and “artificial intelligence”.
What part does the “Grand Central Station” search engine you developed play in this context?
Kraft: The results of the project were a first step in the right direction. One of the most important results is the extensive support of various document formats: information in practically all conventional document formats is converted to a universal document format – XML – and this is then specially indexed. In this context, Grand Central Station should be seen more as a tool that can be used to create the basic conditions for obtaining and extracting information.
What characteristics does Grand Central Station have compared to other search engines?
Kraft: The challenge is to extract structured information from unstructured documents such as HTML – which itself still contains errors in most cases – and to then use this structured information for specific searches. For example, it is possible to search for the author or other attributes of a document, which were stored separately when the document was converted into XML.
“jCentral” can be used to search for program code in Java or XML on the Internet. How does it work?
Kraft: jCentral and xCentral were the first applications that resulted from Grand Central Station. As far as structured searches go, these applications are far superior to conventional search engines. For example, jCentral “knows” that a search relates to a Java program. This allows it to look for certain program constructs in this language. This includes, for example, class names, the classes that implement certain interfaces, or other attributes that are typical for Java software code. The xCentral search engine does the same for XML. This was an development away from the standard forms in search engines, which usually only search for keywords.
How can software developers benefit from a search engine like this?
Kraft: jCentral can be integrated in development environment, for example IBM’s WebSphere Studio Application Developer. There, the search engine can intelligently support the software developers during program and software development. If a developer wants to incorporate a sort routine, for example, but cannot make a decision because of a lack of information about the performance or storage requirements of the available routines, jCentral can provide this information directly in the development environment. Measures such as this can considerable increase the productivity of a software developer.
At the moment, Web services are the words on everyone’s lips – SAP is one company providing such products. When do Web services make sense for you, and how do you think they will develop?
Kraft: Web services represent the natural evolution of the Internet. At the moment, the Internet is mainly used by people, and that is what the structure of the Internet is geared towards. However, this will change in the future – and has already changed to some extent. The information on the Web will be processed directly software agents or other intelligent machines. All kinds of companies and organizations will make use of this to work more efficiently and across enterprise boundaries. In time, this could result in the “Semantic Web,” which would ultimately effect us all in the way that we work and communicate.The “Semantic Web” is the evolution of the Web. Initially, the Internet was designed mainly for people who were able to read static websites. Subsequently, websites were dynamically generated by what are called CGI scripts. But in all these developments and the trend towards dynamically prepared websites, the individual is still at the forefront. Machines have difficulty processing websites automatically. The “Semantic Web” would make this easier by providing additional data or “semantics” on websites, which would allow the machines to work automatically. They are therefore able to extract information and make decisions. For example, they could automatically arrange a doctor’s appointment, or plan a trip including the reservation of flights, hire car and hotel.
One of the main things you are working on is a program that links up computers with the aim of processing tasks on thousands of PCs separately and then merging the individual solutions at the end. Where could a program like this be used?
Kraft: The idea is already a few years old. The work formed the basis of what is known as “grid computing.” At the moment, the researchers are very active in this area, and the technologies for it are of particular interest for sophisticated, scientific calculations. However, there is no reason why this technology could not be used in commercial applications. There is considerable potential in this area for many interesting developments over the coming years.
How do you envisage the information procurement and search operations of the future?
Kraft: Intelligent search engines that work decentrally and learn from their users how to automatically improve their search results. In this respect, the “Semantic Web” can help obtain more specific search results. This will probably be the area occupying many researchers, myself included, in the next few years.
What is your personal motto?
Kraft: Be creative and try to develop information technology with new ideas.