A Beautiful, Networked World?

Andreas Blumauer
Andreas Blumauer

The Semantic Web – the vision of the architect of the World Wide Web, Tim Berners-Lee – is still a long way from reality. What are the reasons for this slow development?

The vision of Tim Berners-Lee is a major developmental step. It can’t be realized in a single day. For it to become reality, we first have to answer questions of interoperability at the terminological, organizational, and technical level. Those issues are the precondition for the Semantic Web. The challenge, of course, is not simply a matter of technology, even if it appears so at first glance. That’s why various disciplines must recognize and work through the problem: document experts, librarians, linguists, and informatics specialists.

As the Semantic Web, many understand “thinking” computers that can understand comparatively “vague” entries and then generate from the context the information being searched for. How much of this perception is correct?

The perception applies to a specific application of semantic technologies, but the Semantic Web should not be reduced to a problem with searching. It’s much more a methodology or technique that helps network information and knowledge objects meaningfully and dependent upon a context. It includes not only documents, but also terms, processes, and even people.

Semantic search engines should match the essence of human thinking. How can someone imagine that in practice, and how would it function?

Human beings store knowledge and experience as images and as networks of terms (generic terms and subordinate terms) or associated topics that can then be woven into complex concepts. And human beings remember rules like “dogs eat meat but not fruit.” This rule applies to all dogs and does not need to be stored extra for each breed of dog. Through the creation and externalization of knowledge networks or ontologies that have developed in this manner, a computer “learns” more about the world and the terms that occur in it. It thus limits the search and delivers precise results.

Where does the Semantic Web reach its limits? What happens to the information that’s not being searched for?

Of course, the Semantic Web is only as intelligent as the knowledge models behind it, such as ontologies. Statistical and linguistic procedures of text analysis can also support expanded search queries. This query expansion with the aid of semantic networks or “inferencing” at the logical level, called automatic conclusion, enables users to search for information that’s not even being searched for. Put briefly, a great deal is possible, depending upon the knowledge model being used. But I want to exclude the possibility that machines will ever become creative or output information intuitively. And besides: if you enter a search term in Google and place a tilde (~) in front of it (~RSS, for example), you’ll see that not even Google completely excludes the use of semantic technologies. You get all the Web sites related to RSS, even those that do not use the term.

Language is alive and imprecise; it is not always unambiguous. How does the Semantic Web deal with this difficulty?

The imprecision of language can be cushioned somewhat by setting up and using semantic models, such as taxonomies, thesauruses, semantic networks and ontologies, and statistical procedures. And that’s the added value of semantic technologies. A search for “ohntology” delivers the correct results despite the misspelling. And the system can also ask the user for the context in which the term “ontology” is of interest – semantic models enable this feature. And the ambiguity of terms can also be caught in this manner, which ultimately saves time and is less nerve-wracking.

In a period in which myriads of information units can be found on the Web, what’s the situation with the credibility and reliability of the information? Isn’t there a need for some kind of control instance that classifies the quality of the information?

A control instance would be nice, but it would also be a tremendous danger. The Semantic Web will have just as few central instances as today’s Internet. It’s much more a network of an infinite number of smaller networks. Of course, you can use metainformation to describe the credibility and reliability of information. That’s the reason for the “trust layer” in the Semantic Web architecure of the W3C.

We already know the hoaxes that will be perpetrated with metainformation from the history of the Internet, when many sites contained x numbers of metatags. Looking at the Semantic Web, it’s a problem without a solution. It’s one of the questions that today’s research is looking at right now. Accordingly, the Semantic Web will not have a central control instance to classify information according to its quality. Instead, we’re talking much more about topic clusters or insular domains that can be described meaningfully with ontologies and that can be examined on an ongoing basis by analyzing the frequency of terms – for the meaning of individual terms and the appearance of concepts. This appraoch illustrates a path in the Semantic Web. Documents that have display-specific properties in their structure and terminology fall through the grate. That’s already the de facto situation with Google today.

What applications already exist and where are they used?

We’ll certainly have to wait a few more years for the Semantic World Wide Web (WWW) that Tim Berners-Lee primarily speaks about. Experts count on a time horizon of five to six years. Depending upon what you understand by Semantic WWW, some heralds can already clearly be seen. Just think of the rapid development that RSS news feeds are now experiencing and the new applications and business models that result. Of course, we’re still primarily missing ontologies from which value-added services can be generated. But viewed overall, the first Semantic Web firms that earn money will appear in two to three years at the latest.

On the Internet, however, we can already see several examples that use semantic technology and complex ontologies and support knowledge-intensive business processes successfully. The best opportunities in this area are primarily available to early adopters who can generate the appropriate competitive advantages for a company. Accordingly, we can only hope that small and midsize companies soon discover the advantages of semantic technologies for themselves. Such companies, in particular, need to network the holders of knowledge – community building, improved communication, and personalization of information. Large companies primarily need optimization of process chains beyond the borders of the corporate group.

When can we expect a broad-based market rollout?

The demand for semantic technologies is clearly increasing. Potential customers are much more sensitive, but should not make the mistake of reducing the Semantic Web technology to a simple matter of “search and find.” Some potential still remains on the wayside – terminology management and e-learning, to name two examples. The standards available are an essential factor in market acceptance, standards like OWL to describe ontologies and query languages like SPARQL. Just think about the boost database systems experienced when SQL became available. Something similar is happening now with ontology databases. In the future, they will build the foundation of the entire data infrastructure of the Semantic Web.

Who ultimately maintains the metadata and who pays for its use? Is anyone thinking about these issues already – about a business model?

To some extent, the maintenance of metadata can be automated. It depends upon the required quality of the information defined by the application context. A significant amount of effort for redaction will be needed here. There’s no point in fighting about a business model to justify these costs. In fact, many companies today are victims of a flood of information; many manufacturers promise helpful solutions off the shelf. The problem will become much worse quickly. Only intelligent networking of information is the answer to this misery. Ontologies and metadata play an important role here. Let’s not repeat the mistake of believing the gurus of artificial intelligence who once declared that machines would soon understand human beings at the drop of a hat. The maintenance of metadata is ultimately justified becuase it saves time when searching for information and overall can processes context-sensitive information of better quality.

Knowledge is power. How can companies use the Semantic Web to their advantage?

Companies can use the Semantic Web as a networking technology. As is often the case, there’s a lot more than a software product behind such a technical term. The use of semantic technology demands some maturity from organizations. Not every company is in the position of setting up and using an ontology because doing so requires complex communications processes. The technologies of the Semantic Web can improve distribution processes, management tasks, research and development, and training activities – in principle all the activities that require the availability of high-quality information.