Shrink Data, Save Costs

September 19, 2005 by admin

Companies rely on data warehouses and business intelligence solutions to provide them with accurate insights into their business activities. These combine data in a logical manner and process it to produce useful information. Different requirements across departments and decision-making levels have given rise over the years to a wide collection of technical solutions. To maintain these is too complex and costly. As a result, the stored business information is often inconsistent. This delays decisions, and business objectives cannot be attained satisfactorily.
Global companies recognize this state of affairs to be a competitive disadvantage and are therefore introducing company-wide enterprise data warehouses. This involves them consolidating their business intelligence solutions, which usually means that the companies have to restructure their application architecture. The result is a comprehensive data warehouse architecture that comprises databases from across the company as a whole. This supersedes business intelligence solutions, which up to now merely enabled a limited and purely summary view of individual departments or functional areas by means of what are termed data marts.

Data in its pure form

An SAP BW Data Warehouse Layer is implemented via the new architecture, which ideally should be central and company-wide. It stores the business data, unchanged and sorted chronologically in its pure form, and also ensures its consistency. The data mart level can be populated from this data set which is “granulated” down to the finest level of detail. Data is saved to enable further, new analyses.
The advantages of this technology are manifold. As the data sources are available in a consistent form in their original format, the business information and KPIs gleaned from them can be easily evaluated and standardized. Analyses that are requested directly can be created rapidly, since the new architecture is based on InfoCubes that can be constructed quickly. Administration and training costs fall as the fully integrated solution replaces the existing mix of stand-alone BI solutions throughout the company.
However, this solution represents a technological challenge, and the costs for data storage are not inconsiderable. This is because, regardless of the company size, industry sector or required granularity and history of the information, companies usually have to cope with very high and rapidly expanding volumes of data. Additionally, much of the data is rarely used, yet it requires continuous maintenance and takes up what is generally expensive storage media. This is the case, for example, with customer histories or older data relating to mapping of supplier processes.

Columns, not rows

In order to maintain this data as effectively as possible in SAP NetWeaver Business Intelligence (SAP NetWeaver BI), Sand Technology has teamed up with SAP as a software and development partner. The result of this partnership, Sand Searchable Archive, can be integrated into a “near-line architecture” for SAP NetWeaver BI. Nearline means that the solution stores historical data which, while it generally does not require the level of accessibility of up-to-date online data, still needs to be accessed quickly.
The Sand solution enables data saved as ODS objects in the SAP BW Data Warehouse Layer to be compressed highly efficiently, usually by more than 85 percent, as compared with relational databases, without restricting access to the ODS objects for the SAP-BI process flow and without imposing limitations to direct analysis. With SAP BW 3.x, administrators can operate the Sand Searchable Archive via the central, intuitive graphic user interface. A corresponding integration into SAP NetWeaver 2004s is planned.
How are the high compression rates achieved? Sand Searchable Archive is a pure software solution and is available for the operating systems Windows, 64 Bit Linux, SUN-Solaris, HP-UX, and IBM-AIX. It is based on a special form of data storage that stores the data in columns as opposed to the usual rows. The data values contained in the columns are replaced by codes or placeholders, known as tokens, that take up little storage space and are also stored in unique value tables. This separation into data values and tokens results in a lower data volume and accelerates queries. Further, the solution ensures that only standardized data types are entered into the columns.
The results of these storage processes are stored in Sand Compacted Tables (SCT). These contain compressed tokens and descriptions of the data values. All Sand Compacted Tables are in turn combined within a dedicated metadata maintenance system to enable evaluation for reports and analyses. In connection with the Sand Nearline Integration Controller for SAP BW 3.x, ODS objects or InfoCubes can be stored wholly or partially in the Sand Searchable Archive in a highly compressed form. This dispenses with the need to set up tables and columns, let alone indexing for downstream queries. Depending on the hardware architecture and number of allocated CPUs, the Sand Compacted Tables are constructed via a computer with one or more CPUs or in a computer cluster and stored either remotely or externally.

A tenth of the size

Depending on the composition of the data, ten terabytes of data in ODS objects saved in the SAND Searchable Archive usually produce a highly efficiently stored SAP BW Data Warehouse Layer of around 0.5 to 1.5 terabytes, for instance. If required, the data stored in the Sand Searchable Archive at the InfoProvider level can be transferred back to the InfoProvider of the SAP Business Information Warehouse (SAP BW). This then enables historical data stored in the SAP BW Data Warehouse Layer to be made available in the form of ODS objects that can be used to construct InfoCubes very quickly in the SAP BW.
Data access to the Sand Searchable Archive is via virtual InfoProviders that are set up automatically. Thus, all that is stored in the SAP BW is a structure description for the data stored in the Sand Searchable Archive. The actual data access in the SAP BW OLAP processor is then performed via an interface implemented as part of the Sand solution. These virtual InfoProviders support direct queries that need to be answered quickly and in great detail for historical periods. Reports with values and comparisons from ODS objects that are active and located in the Sand Searchable Archive are created via MultiProviders. The metadata management means that only the data required for the specific analysis is selected in the respective SAND Compacted Tables and made quickly available to the virtual InfoProvider and SAP BW OLAP processor for further processing. The data does not have to be decompressed in the Sand Searchable Archive beforehand.
Thanks to the Sand solution, companies that have opted to implement a company-wide information repository based on SAP NetWeaver Business Intelligence have an easy-to-integrate component for generating a near-line solution geared toward large data volumes for the SAP BW Data Warehouse Layer. It stores the data highly efficiently and cost effectively without loss in performance for the SAP NetWeaver BI process flow or for direct analysis.

Dr. Peter Zimmerer

Dr. Peter Zimmerer

Roland Markowski

Roland Markowski

Tags: , , ,

Leave a Reply