Companies have to deal with increasingly large volumes of data. This flood of data is the result of legal requirements that demand the storage of vast amounts of information, and a growing number of operational processes needing up-to-date, transparent figures. The consequences are high storage costs, considerable outlay in system operation, plus bottlenecks in routine queries and in backup and recovery.
Prices for efficient hard disk storage systems continue to drop, but this is not enough to compensate for these other developments – the flood of information is increasing at a greater rate than the price/performance ratio of storage technology. What’s more, analysts from Dataquest and Gartner estimate that every euro invested in hard disk storage generates five to ten euros in operating costs. The reduced system availability caused by large data volumes cannot be addressed simply by increasing storage capacity as that would require a bigger computer with more CPU and computing capacity.
Data storage in three stages
The concept of a centralized data storage system in which all information is kept in a relational online database has therefore reached its limits. SAP, however, has created new potential for data warehousing with its nearline storage (NLS) interface for SAP NetWeaver Business Intelligence 7.0. (SAP NetWeaver BI). Nearline storage represents a midway house between online database and offline archive. Using Data Archiving Processes (DAPs), the NLS interface allows InfoCubes and DataStore Objects (DSO) that are not in frequent use to be moved from the relational database to a nearline storage system. These are subsequently deleted from the relational database by the DAPs. SAP Netweaver BI 7.0 then provides direct, transparent access to the data in the nearline storage system. This means it can still be accessed very quickly if key figures and reports need to be constructed again, or in connection with an audit of previous years. The relational database therefore holds only data currently in use, while the nearline solution stores all information that is not relevant at this point, but may be needed in the future or cannot be deleted for legal reasons.
SAND/DNA from SAND Technology is a database-independent solution for nearline storage for Windows, 64 Bit Linux, SUN-Solaris, HP-UX, HP Tru64 (Unix) and IBM-AIX operating systems. When used in combination with the NLS interface, this solution can create efficient and compact nearline storage thanks to its ability to greatly compress data. The DAPs are integrated into the process chains in SAP NetWeaver BI and after the completion of periodic processes – such as data transformation or loading of various data levels – surplus DataStore Objects and “old” InfoCubes are quickly removed from the relational database.
Highly compressed information saves storage space
During data archiving processes, SAND/DNA compresses the stored information without compromising its accessibility. This reduces the amount of storage space required and also cuts costs. Results gathered in live customer environments show that DataStore Objects from SAP NetWeaver BI 7.0 can be compressed by up to 90 percent when stored in SAND/DNA.
This high degree of compression is possible thanks to the combination of special compression processes and a specific type of data storage that holds information in columns rather than rows. The data values contained in the columns are replaced by placeholders, known as tokens. As the tokens are stored as integer values, they generally need less space than actual data values. In addition, the data values of the column are only stored once in areas known as domains. This avoids redundancies and saves additional space – the amount saved grows as the cardinality of the data properties decreases and the space required by each single value increases.
The domains and token lists are compressed in blocks. A defined navigation structure, the record map, enables specific blocks of data to be located very quickly and decompressed when queried. Depending on the composition of the data – the cardinality, the storage requirements of individual data values compared to tokens, and the ease with which the value properties can be compressed – two terabytes of data in DataStore Objects, for example, converts to 150 to 200 gigabytes after storage in SAND/DNA.
The compressed tokens and data value descriptions are stored in SAND Compacted Tables (SCT). These are then combined in a dedicated metadata management system so they can be evaluated more easily for queries. As the information is stored in columns, not rows, there is no need to create tables and columns or even index the data to deal with queries.
Data queries in SAND/DNA are controlled by the OLAP processor of SAP NetWeaver BI. After the relevant NLS report parameters have been set once, the processor automatically pulls together the query results from the online and nearline environments. InfoCubes and DataStore Objects stored in SAND/DNA can then be queried via Business Explorer (BEx), or any other front-end certified for SAP NetWeaver BI, and via the ListCube function. Use of metadata management means that only the data required for the specific query is selected in the SAND Compacted Tables, aggregated, and made available to the OLAP processor for further processing. The data does not need to be decompressed beforehand.
The system administrator can use the established SAP NetWeaver BI data transfer processes (DTP), plus a new Nearline LookUp API (Application Programming Interface) to create new reporting scenarios. If there is a request for a new, multi-period key figure that also contains historical information for comparison, the solution assigns the nearline storage system to the DTP as the data source of “historical” information. Current information is passed to the key figure from the relational “online” database.
Efficient memory for corporate environments
The combination of SAP NetWeaver BI and nearline storage reduces the data volume of DataStore Objects and InfoCubes. This in turn makes administrative tasks far more straightforward as less time is required for backups, for example. Costs relating to system operation, storage media, main memories and CPUs are cut, and it is easier to fulfill service level agreements (SLAs).
At the same time, an efficient and compact corporate memory is created, and uses DataStore Objects to quickly query all granular data without impacting on the relational database. There is also no need to access archiving or ERP systems when creating unforeseen reporting scenarios or querying key figures.