High Performance Business Intelligence (High Performance BI) will benefit any companies that handle large volumes of data. These include retailers, utilities, or telephone companies, but many more companies will begin to handle more data as they deploy new technologies like radio-frequency identification tags. Those that benefit most will have corporate environments with high data volumes, where queries routinely involve access to many millions of records and may involve up to a billion records. The benefit will also be great for companies with hard response time ceilings, for example where call center operators need guaranteed response times, or with unpredictable types of queries, involving different data sets and different aggregations that defeat traditional optimization and caching strategies (aggregation simplifies tables by hiding detail and summing or averaging related rows).
New approaches and technologies
Although conventional approaches to accessing BI data are robust and mature, they confront administrators with the headache of balancing good response times with acceptable maintenance effort. In the past, administrators had to work hard to ensure reasonable response times for difficult queries. They had to study user behavior and analyze frequently asked queries (such as for quarterly sales figures broken down by product name and sales outlet), build aggregates and database indexes in advance for frequently accessed data, conduct regular reviews, and so on. This work required specialized skills and consumed expensive resources. Response times to queries that hit the right aggregates and indexes were improved, while response times for other, possibly very similar queries were unchanged, so good judgement was required to optimize performance.
The new High Performance BI owes its strength to the latest research in search technologies in such areas as algorithms for scalable and distributed attribute search (attributes are table cell value ranges like product or price), data decomposition and partitioning, smart compression and in-memory processing. Powering the new capabilities is a newly enhanced and state-of-the-art SAP NetWeaver aggregation engine for processing structured business data. It achieves its unprecedented speed by incorporating several new approaches and technologies.
The engine stores table data columnwise in memory for efficiency. This involves vertical decomposition of data tables, rather than the row-based storage used in conventional relational database systems. In a conventional database, if no prebuilt aggregate is available to answer a query, all the data in the table must be sifted, whereas in the new engine only the relevant data columns are touched. The engine can then sort the columns individually to bring the relevant entries to the top. Efficiency is improved because data flows are smaller, which reduces input-output load and main memory consumption.
The data compression is based on dictionaries. Integers represent the text or other values in table cells. This allows more efficient numerical coding and smart caching strategies. For example, if a column contains a thousand rows and some of the cells contain long texts, it is more efficient to use ten-bit binary numbers to represent the texts for processing and use a dictionary to look up the text values afterward. This reduces the data volumes to be transferred and cached during join and aggregation operations by an average factor of ten. This in turn enables all query processing to be performed in main memory and reduces network traffic in distributed landscapes.
Horizontal partitioning of large tables for parallel processing on multiple machines in distributed landscapes enables the engine to handle huge volumes of data yet stay within the limits of installed memory architectures. Formerly, data volumes of more than about three gigabytes had to stay on disk because they would not fit into a single address space in memory. Now, the volumes can be split over multiple memories, where they can be processed fast, in parallel. This scalability allows High Performance BI users to tap the enormous potential of advanced and adaptive computing infrastructure such as blade servers and grid computing.
Up to a thousand times faster
The models and algorithms used are optimized and tailored to take full advantage of the modeling semantics and constraints of the SAP BI solution. For example, the data types are often integers, special table types are defined in SAP BI, query plans include join operations that access BI star schemas, and aggregations use highly optimized binary-coded-decimal logic. Decimal numbers are coded digit for digit into binary numbers to avoid the rounding errors that occur with floating point numbers, but the results are harder to add. A new optimization implements addition more flexibly for these numbers by breaking each addition into parts that can run in hardware registers.
The SAP NetWeaver search and aggregation engine leverages these new approaches and technologies to deliver average response times that are ten times faster than traditional approaches. In some cases, queries executed via the new technology are up to a thousand times faster. In addition, average times for loading data are reduced.
High Performance BI allows faster data access with minimal administrative overhead. It is particularly impressive in scenarios involving unpredictable query types, high volumes of data and high request rates. The new technology complements but does not replace the existing technology, so it is easy to deploy. Since the frontend is unchanged, users do not need to be retrained. Companies can leverage the new speed and flexibility from day one to “smarten up” their business processes. This can be critical for competitive companies that aim to win through innovation and responsiveness. For smart companies, High Performance BI will raise employee productivity and hence reduce the total cost of ownership.