Semantic Web: It’s All in the Meaning

Prof. Wolfgang Wahlster
Prof. Wolfgang Wahlster

What is semantic technology?

Wahlster: In essence, semantics is the science of content and meaning. In IT terms, that means enabling computers to understand the meaning of human language. At the moment, software systems and the Internet are still syntax based. In other words, they focus on the linkage of characters into character strings. When linking from one document to another, computers take no account of meaning. Semantic technologies, on the other hand, allow us to assign content to these links. However, in doing so, it is essential to factor in the ambiguous nature of human language.

How can semantic technologies understand human language?

Wahlster: Semantic technology interconnects terms, like a kind of giant encyclopedia. This is based on the premise that all types of relationships can be defined between terms. For example, terms can be synonyms or antonyms, there are generic terms and subordinate terms, something can exemplify or represent a term. This system of terms and relations creates a semantic network that the computer can use to draw certain conclusions and fill in blanks, so to speak.

What sort of opportunities does this capability offer?

Wahlster: This capability is vital for the interoperability of software systems that work with different terminology sets. By means of a content description that computers can understand, it enables computers to establish whether two terms refer to the same thing. Usually, if a number of departments within a company develop their own specific terminology, computers are unable to cope because the various terms differ in their syntax. Semantic descriptions enable software to pick out terms with the same meaning.

Computers can then also identify when a description is a special case of another. For example, if a call center takes a customer complaint about a software error, a semantic software system would be able to automatically identify that the call is a special variant of an earlier, more general query. This cuts the workload involved in processing the complaint.

How do you create a semantic description?

Wahlster: The semantic annotation of data or documents is performed using ontologies that describe terms and their relations with each other within a formally defined system. At present, we have a range of standardized ontologies such as the Standard Upper Ontology of the IEEE. Universal ontologies such as these are used to describe basic terms such as “process” or “event.” There are also ontologies designed for sectors and specific domains such as the General User Model Ontology GUMO developed by DFKI. GUMO allows users such as telecoms suppliers and mail-order companies to describe customers in CRM systems so that they can compile and compare data beyond the confines of individual companies.

Where is semantic technology currently being used?

Wahlster: One example would be the semantic desktop that structures private files so that users can search them by content as well as via the more conventional keyword search. Even if a document doesn’t contain a specific arrangement of characters, because an alternative – but synonymous – term has been used, a semantic desktop search will still deliver the right result because the two terms are interconnected via ontologies.

The automatic e-mail responses used by companies to reply to specific queries represent another area where semantic technology is being used. For example, if there is an error in a new software release, the manufacturer will quickly receive a large volume of queries relating to this error. These e-mails will have been composed in a variety of ways by a number of people, but will essentially all mean the same. Semantic technologies enable the manufacturer to process this unstructured data automatically. So long as the computer has identified the subject of the query via ontologies, it can automatically send a link to the relevant patch.

What is the potential of this technology in terms of Web services?

Wahlster: The “SmartWeb” project, run by DFKI with financial support from the German Federal Ministry of Education and Research, is leading the way in semantic Web services. We have developed the basic principles of the semantic Web to such an extent in this project that SmartWeb offers users an “answer engine” instead of a search engine. For example, if a user asks about the sales generated by Siemens in Germany in 2006, semantic searching will deliver the exact figure. In contrast, a keyword search via Google will deliver several hundred thousand hits.

Moreover, if the sales figure is required in U.S. dollars and not in Euros, as it would normally appear in the company’s annual report, semantic technology automatically recognizes that a currency converter has to be used. This simple Web service then takes the result of the semantic search engine as an input and provides the new figure. Users no longer need to work with a range of separate services and their various input and output data. If services are interlinked with each other on a semantic basis, the computer can use them as building blocks to construct complex, higher-value services.

Although the semantic Web cuts user workload, surely the process of creating and inputting all the necessary metadata entails a lot of work. In view of the sheer volume of content on the Web, what is the best way of tackling this challenge?

Wahlster: That is a key problem. For as long as so few Internet pages are annotated with ontology description languages such as Web Ontology Language (OWL), the semantic Web will be little more than an experiment in this context. However, it is unlikely that everyone with content on the Web will add semantic descriptions to that data.

That’s why we need a next-generation Internet that combines the user participation typical of Web 2.0 with the semantic Web. That is precisely the aim of the THESEUS program, which is being run by the German Federal Ministry of Economics and Technology with the cooperation of DFKI and SAP. If we are going to describe websites semantically we need users on board. What’s more, we need their support on a scale like that of the ubiquitous blogs and social networking sites that populate the Web. End-users are the key to completing this enormous task, which would be way beyond the capabilities of any single company or state body.
Research work is focused on making the tools we need to achieve this as simple as possible. When assigning keywords on photo or video portals, a range of alternatives can be offered straight away for ambiguous terms. Just one more click to select the required meaning and the user jumps into an ontology where this meaning is linked to other terms. Getting users involved makes semantic annotation cost effective, thereby helping us cope with this huge challenge. However, the changeover from a syntax-based Web to a semantic Web will not happen overnight – it will be a gradual evolution. The HTML documents that exist today will be enhanced with semantic data as part of a step-by-step process.

Why should users get involved?

Wahlster: Even today, there are many people who contribute to Web 2.0 without any material incentive. What motivates them is the feeling that they are part of a community and a common effort. People are proud of taking part in something that ultimately benefits them and others. This is what we need to focus on for the semantic Web.

What type of timescale is involved in establishing semantic technologies in industry?

Wahlster: I believe it will take just three years for semantic technologies to take their place among the most important IT technologies. Semantic technologies will become the preferred tool for structuring corporate know-how and constructing services in the field of enterprise software, an area that already focuses on process standardization and modeling.

What are the limits of semantic technologies?

Wahlster: Human language also incorporates the phenomenon of connotation, which relates to the way we interpret a term based on our own personal experiences. Take, for example, the term “sofa.” In principle, we all know that this is an upholstered piece of furniture that several people can sit on. However, perhaps some people think of the sofa their grandmother used to sit on while others associate the term with a completely different piece of furniture. This type of semantics is highly personal and extremely subtle. Naturally, we don’t want to input connotations such as these into a computer. A computer does not need to look at all the nuances that a person might consider – it only needs to go as far as is necessary to ensure that man and machine can understand each other efficiently.