Companies are often faced with the challenge of integrating external datasets rapidly and consistently into their data pool. The layer architecture developed by Inmon creates a unified dataset that all departments can access. Through it, the complexity of enterprise-wide data integration becomes manageable, so that fewer wrong decisions are made and information loss avoided. The result is a single version of the truth – a harmonized corporate memory that is universally valid.
According to Inmon, the classic enterprise data warehouse (EDW) comprises three tiers:
- the extraction layer
- the data warehouse layer
- the presentation layer, which makes the data visible
One of the main pillars of the EDW approach is the clear-cut separation of the three layers and the processes that are based on them. But, due to technological insufficiencies, it has not always been possible to consistently maintain this separation in the past.
In the first separation step, all the external data is transferred to the persistent staging area (PSA) of the company’s BI solution. Data must not be lost, because the required datasets are often no longer available in the future. In the data warehouse layer, the integrated data is harmonized with the data that is already there.
This is no easy task, because companies use quite different semantic systems to model their business processes. For example, identical customers are assigned different customer numbers or identical customer numbers are used for different customers. The result? Inconsistent data.
SAP already specifies logical data structures and nomenclatures. In addition, it has speeded up the integration and harmonization process with the write-optimized DataStore objects (DSO) now available in SAP NetWeaver BI. However, that is not always enough.
Avoiding inconsistent data
William H. (Bill) Inmon comes from San Diego, California, and is widely recognized as the “father of data warehousing.” He has over 35 years’ experience in database management and data warehouse design and developed the concept of the Corporate Information Factory (CIF). In 1999, he set up the Web site billinmon.com, dedicated to data warehousing and CIF and aimed at experts and decision makers. Inmon has written 45 books and founded several companies.
The concatenating key extension offers an elegant solution. This procedure attaches additional information to the objects that gather the data, enabling ambiguities to be resolved. The objects (or business content) provided by the SAP solution are, however, only partially enabled for this and must be modified accordingly.
Nevertheless, semantic inconsistencies and structure-related conflicts cannot always be resolved with a key extension. That is why a consistent three-layer EDW concept stipulates a set of rules for structuring a harmonized data warehouse. If possible, optimized DataStore objects should be used to write directly to the data warehouse layer, because they improve performance. Alternatively, standard DataStore objects that have a delta function, that is, change logs, can be used.
Furthermore, every time external data is integrated, the data flows directly from the persistent staging area of the extraction layer into the data warehouse layer. Only here is the data saved permanently, enabling the corporate memory to be formed. Data is removed from the extraction layer again only in accordance with data aging rules.
If further source systems are integrated with the enterprise data warehouse, they just need to reproduce the transfer from the extraction to the data warehouse layer. Here, the external data uses structures that are already there, such as DataStore objects, transfer rules, data transfer processes, and InfoCubes.
Clear separation according to processes and responsibilities
Top priority is the consistent separation of the layers using layer-specific naming conventions, processes, and data structures. Although extensions are permitted, the harmonized data in the corporate memory must not be filtered or changed in any other way. This is because the data warehouse layer is the company’s information foundation and feeds the presentation layer.
From this point, the different departments access precisely the information they need to do their jobs. Consequently, all analyses and evaluations take place exclusively in the presentation layer. Reporting tasks have no place in the data warehouse – in other words, in the second layer. Otherwise the data basis would not remain stable.
It is also important that the responsibilities within the individual layers are clearly defined: The source system is responsible for extracting its data. The data warehouse team is directly responsible for the data in the data warehouse layer. The relevant user departments manage the processed information in the presentation layer.
The latest version of SAP NetWeaver BI includes integration mechanisms such as preconfigured extractors for differently structured source systems that are already equipped with business logic. Other content objects are not, however, able to accommodate unforeseen data structures. Inmon’s concept for enterprise data warehouses remedies these weaknesses and enables customers to respond to future challenges fast, flexibly, and cost-efficiently.