Features for Improved Data Quality

Foto: Shutterstock
Foto: Shutterstock

Companies expect Big Data to give them deeper insights into their customers’ needs than ever before – that is, in a matter of seconds. The underlying in-memory technology cannot leverage its full strengths if the quality of the respective data is poor, however. SAP Data Services and SAP Information Steward support companies in assessing and improving the quality of their data. SAP recently introduced Version 4.2 of the two solutions.

On the one hand, SAP Data Services and SAP Information Steward are two independent products that customers can use individually. On the other hand, they share a common back end and utilize one another in integrated installations – as Niels Weigel, Senior Solution Manager for Enterprise Information Management at SAP, explains. Weigel sees SAP Data Services as more of a “technical” tool for data integration, data quality improvement, and text data processing for structured and unstructured data. In contrast, SAP Information Steward is a tool that gives even non-IT experts insights into the content quality of existing master and transaction data.

Automatic transformations in SAP HANA

SAP Data Services serves as the data hub at a company, Niels Weigel says. The new Version 4.2 has further optimized interaction with the SAP HANA in-memory database. While the previous version already made it possible to improve data quality directly in ETL (extract, transform, load) operations, in Version 4.2, many transformations are now executed directly in the SAP HANA database. This is done by the system, without users having to optimize this push-down function themselves. For example, if SAP Data Services detects that certain functions (such as merging two tables) can be executed directly in the fast database, the software implements this optimization automatically, and all the user notices is that everything goes much faster, says Weigel. In addition, data quality functions are also integrated directly in the SAP HANA platform, step by step. A typical example, according to Niels Weigel, is checking whether the street names in data records are still current or whether streets have been renamed in the meantime. “The address validation takes place directly in the database,” says Weigel. And at SAP HANA speed, too, which Weigel says delivers a “dramatic leap in performance”.

–          SAP Information Steward identifies the costs of poor data quality

–          Evaluate social networks with the Big Data interface

–          Business users and IT experts work hand in hand

To enable the fast, simple replication of tables in the SAP HANA database, for example, Version 4.2 of SAP Data Services integrates Workbench 2.0. Users can control replication through a graphical user interface, which in addition to the simple mapping of table columns, also supports the integration of basic transformations.

SAP Information Steward lets business users check whether the data meets their requirements. “One data profiling feature, for example, might discover that 42 different forms of address are used in the address data,” explains Niels Weigel. Based on such anomalies, as well as on external requirements and company-wide standards and guidelines for data quality, validation rules are defined and applied to the existing data. This lets companies measure how many data records comply with or violate their requirements – that is, just how good their data quality really is.

Identifying the financial impact of poor data

SAP Information Steward can also document the impact that poor data can have at a company. It does so by identifying the origin of the data and where it is used, for example, by examining the staging tables in a data warehouse that a company uses for its reports. SAP Information Steward not only shows whether these tables contain wrong or incorrectly formatted information; by directly integrating the technical metadata (data lineage and impact analysis), it also identifies the source of the incorrect data and lists the instances where it is used. If a look at the source data reveals that it contains incorrect or unreliable information, says Niels Weigel, this is likely to cause problems down the line, as well. In addition, “SAP Information Steward 4.2 even makes it possible to identify the financial impact of poor data quality”, says Weigel.

Next page: Calculating the ROI of subsequent projects

Customers can now directly link the individual follow-on costs – such as required resources or process costs – for identified data quality problems with individual validation rules. This involves questions such as: Is time-consuming manual maintenance needed to correct an identified error? The effort required can be calculated easily. “The user simply enters the process costs when they configure a validation rule,” explains Weigel. This makes it possible to calculate how much it will cost the company to improve the data quality manually. The different effects of the various problems can be compared. Does it cost more in the end to leave wrong postal codes the way they are for a marketing campaign, or to correct the data with an address cleansing solution directly during the import? “These what-if analyses let companies calculate the ROI of such activities,” says Niels Weigel.

The SAP manager points out the new Data Quality Advisor as an improvement to the user experience. If you feed it with data from a table whose columns are only named A through E, for example, it proposes new names for the columns dependent on the cell format and content. The Data Quality Advisor can identify postal codes and phone numbers, for example; in the latter case, it even suggests uniform formatting to the user. In Niels Weigel’s words, the Data Quality Advisor is a clear example of how SAP Data Services and SAP Information Steward interact: “The Advisor identifies problems and solution strategies for business users at the level of SAP Information Steward, while SAP Data Services processes the information in the background.”

Analyze data from Facebook and Twitter

If you want to analyze data from a variety of external sources such as social networks, SAP has enhanced Version 4.2 of SAP Data Services with an open interface that lets companies dock their own connectors – for example, to NoSQL databases and microblogging platforms. “The Big Data Adapter SDK lets customers and partners add their own specific connectors for individual services, such as Facebook and Twitter, as needed,” says Niels Weigel.

Next page: Combining business and IT improves collaboration

SAP Data Services brings IT experts and business users closer together

As Neils Weigel sees it, SAP Data Services and SAP Information Steward “bring IT experts and business users closer together.” When it comes to cleaning up product descriptions, for example, users from the business department can define rules, such as which standard format the manufacturer name should have in a list of printers from different vendors. An IT colleague can then implement these rules with the generated cleansing package, which contains the business user’s expertise, to structure the product descriptions automatically. As Weigel says, “tools like this make collaboration between teams much easier.”