SAP Data Services 4.0 brings meaning to unstructured text (collage:
SAP Data Services 4.0 brings meaning to unstructured text (collage:

Much has been written about SAP In-Memory Appliance software (SAP HANA). By now, you probably have the gist of it: store data in such a way that you can process lots of it really fast. Data, however, comes in all shapes and sizes. And while SAP HANA works extremely well for structured data, unstructured data – e.g. emails, Tweets, blog posts, instant messages, videos, and pictures – can’t be processed in the same way.

Businesses recognize that unstructured data is a virtual gold mine of customer, brand, and market insights and even customer intention, but many companies don’t have the means to manually pick out the relevant business information from all the background noise. Facebook alone generates in excess of 20 terabytes of data every day, according to Nick Halstead, CEO of DataSift; just imagine what that number will be in a few years.

With SAP BusinessObjects Data Services, SAP provides companies with the ability to automate the extraction of unstructured data from various enterprise and online sources and the analysis of the information for business insights. On September 16, the 4.0 release of the SAP BusinessObjects EIM solution became generally available and introduced enhanced text analytic capabilities to the Data Services application. On the following pages, we’ll show you how Data Services works and how businesses can benefit.

Text analysis beyond the emoticon

There are many reasons for companies to analyze unstructured data: to better understand the customer, to monitor what is being said about the company and competitors, to make sure employees aren’t disclosing sensitive enterprise information (insider trading, for example), or to detect fraud. With SAP Data Services 4.0, companies are able to get these insights.

The first step is to pull in the unstructured data from various online and enterprise sources, such as web sites, social media sites, survey responses, contact center notes, regulatory filings, and corporate documents. This is a potential challenge for enterprises, which often have petabytes of data in databases and disparate software systems spread across the globe. SAP Data Services automates this process with a data integration tool for the SAP NetWeaver Business Warehouse component. Once the data has been gathered, the next step is to “clean” it. This entails removing duplicate text or standardizing a company name, for example.

SAP Data Services then goes about analyzing the extracted data, which involves information such as: keywords, creation date, and author of the post; names of persons, organizations, geographic areas, events, and products; email addresses, telephone numbers, and account numbers; sentiment, attitudes, emotions, topics, and themes; and structured data from data tables.

Linguistic structures within the text make it possible for the software to “understand” the text and identify topics, themes, and patterns. Sentiment analysis is an important part of this process. In the past, however, this technology was based mostly on keywords and simply classified data as positive, negative or neutral. The enhanced sentiment analysis capabilities in Data Services 4.0 give much more nuanced results because it takes emotions, opinions, and customer intent – such as making a purchase or canceling a subscription – into account. Still, text analytics is not fail-proof. The use of slang, sarcasm, or exaggeration, not to mention misspellings or misinformation, can produce false results.

An attractive feature of SAP Data Services is the ability to customize the text data processing dictionaries and rules. For example, if you want to analyze data on mergers and acquisitions from the past year, you can set a rule to extract all phrases with a structure like “company name, all forms of the verbs buy, sell, or acquire, company name.” In addition, if customers often use an unofficial term to refer to your product, you can add it to the application’s dictionary.

Turn messy data into business insights

The analysis of unstructured data alone, however, does not provide businesses with the complete picture. The real business insights occur where unstructured data and structured data analyses intersect. Therefore, SAP Data Services also connects back to transactional information. For example, text that includes an email address can be matched to a customer profile and the relevant transaction history. Knowing that a customer complained about a product in an online forum is important, but it’s also useful to know whether that customer is a repeat or first-time customer. Anonymous posts, of course, cannot be connected to a specific customer. Nevertheless, general insights on customer sentiment surrounding a marketing campaign can be linked to the campaign results and sales figures for that quarter.

These processes – data extraction, cleansing, and analysis – all run on the same platform as SAP BusinessObjects BI. This means companies will avoid having to manage two separate systems and will experience a lower TCO. Further, it enables them to easily augment their existing analysis of structured data with the analysis of unstructured data.

SAP Data Services 4.0 currently supports six languages: English, French, German, Spanish, Japanese, and Simplified Chinese, and supports sentiment analysis for the first four of those languages.  By the end of 2011, the application will support an additional 25 languages, a few of which are: Russian, Korean, Traditional Chinese, Turkish, and Romanian.