US 20080059415 A1
Improved method of and apparatus for aggregating data elements in multidimensional databases (MDDB). In the preferred embodiment, the apparatus is realized in the form of a high-performance stand-alone (i.e. external) aggregation server which can be plugged-into conventional MOLAP systems to achieve significant improvements in system performance. In accordance with the principles of the present invention, the stand-alone aggregation server contains a scalable MDDB and a high-performance aggregation engine that are integrated into the modular architecture of the aggregation server. The stand-alone aggregation server of the present invention can uniformly distribute data elements among a plurality of processors, for balanced loading and processing, and therefore is highly scalable. The stand-alone aggregation server of the present invention can be used to realize (i) an improved MDDB for supporting on-line analytical processing (OLAP) operations, (ii) an improved Internet URL Directory for supporting on-line information searching operations by Web-enabled client machines, as well as (iii) diverse types of MDDB-based systems for supporting real-time control of processes in response to complex states of information reflected in the MDDB.
1. A data aggregation module for interfacing with a relational database management subsystem (RDBMS) having a relational data store including one or more fact tables for storing at least the base data, and a meta-data store for storing a dictionary containing dimension data, and servicing database query requests provided by one or more client machines, said data aggregation module comprising:
a multi-dimensional database (MDB) having a multi-dimensional data storage structure including a plurality of data storage cells, wherein each said data storage cell is indexed with multiple dimensions and stores either a base data value or an aggregated multi-dimensional data value;
a base data loader for loading said directory data and said base data to said MDD aggregation module;
a first data communication interface operably connected between said base loader and said RDBMS, for transmitting directory data and base data from said RDBMS to said aggregation module;
an aggregation engine for receiving base data from said base data loader, and calculating aggregated multi-dimensional data from the base data according to a multi-dimensional data aggregation process;
a MDDB handler for storing aggregated multi-dimensional data in said MDDB and retrieving aggregated multi-dimensional data from said MDDB;
a second data communication interface operably connected to said data aggregation engine and said one or more client machines, for receiving database query requests from said client machines;
a database query handler operably connected to said second data communication interface, for handling said database query requests;
a controller for managing the operation of said base data loader, said aggregation engine, said MDDB handler, and said database query handler;
wherein after base data loading operations, the base data is initially aggregated along at least one of the dimensions extracted from said dictionary and aggregated multi-dimensional data results are stored in said MDDB; and
wherein during query servicing operations, each database query request is received over said second data communication interface, (ii) the set of dimensions associated with said database query request is extracted, (iii) said dimensions are provided as multi-dimensional coordinates to said MDDB handler so that (i) said MDDB handler can retrieve aggregated multi-dimensional data from data storage cell(s) in said MDDB specified by said multi-dimensional coordinates, and/or (ii) said aggregation engine can dynamically generate, on demand, aggregated multi-dimensional data for storage at data storage cell(s) in said MDDB specified by said multi-dimensional coordinates, and (iv) returning the retrieved aggregated multi-dimensional data back to said database query handler, over said second data communication interface, for transmission to said client machine for servicing the database query request.
2. The data aggregation module of
said base data loader extracts dimension data from the dictionary in said meta-data store, and forwards the dimension data over said first data communication interface to said aggregation engine and configures said MDDB using said dimension data so that each data storage cell therein is specified by a set of multi-dimensional coordinates.
3. The data aggregation module of
said base data loader extracts base data from said table(s), and forwards the base data to said aggregation engine over said first data communication interface; and
said aggregation engine calculates aggregated multi-dimensional data from the base data according to said multi-dimensional data aggregation process, and stores the aggregated multi-dimensional data within data storage cell(s) in said MDDB specified by a set of multi-dimensional coordinates.
4. The data aggregation module of
5. The data aggregation module of
6. The data aggregation module of
7. The data aggregation module of
8. The data aggregation module of
9. The data aggregation module of
(a) if the aggregated multi-dimensional data required to answer a given database query requests is already pre-calculated and stored within said MDDB, then the pre-aggregated multi-dimensional data is retrieved by said MDDB handler and returned to said client machine via said database query handler; and
(b) if the required multi-dimensional data is not already pre-aggregated and stored within said MDDB, then the required aggregated multi-dimensional data is calculated on demand by said aggregation engine, and the aggregated multi-dimensional data result is automatically transferred to said client machine, via said database query handler, while being simultaneously stored within said MDDB by said MDDB handler.
10. The data aggregation module of
11. The data aggregation module of
This is a Continuation of application Ser. No. 10/854,034 filed May 25, 2004; which is a Continuation of application Ser. No. 10/153,164 filed May 21, 2002, which is a Continuation of application Ser. No. 09/514,611 filed Feb. 28, 2002, now U.S. Pat. No. 6,434,544; which is a Continuation-in-part of: copending application Ser. No. 09/368,241 filed Aug. 4, 1999; each said application being commonly owned by HyperRoll Israel, Ltd., and incorporated herein by reference in its entirety.
1. Field of Invention
The present invention relates to a method of and system for aggregating data elements in a multi-dimensional database (MDDB) supported upon a computing platform and also to provide an improved method of and system for managing data elements within a MDDB during on-line analytical processing (OLAP) operations.
2. Brief Description of the State of the Art
The ability to act quickly and decisively in today's increasingly competitive marketplace is critical to the success of organizations. The volume of information that is available to corporations is rapidly increasing and frequently overwhelming. Those organizations that will effectively and efficiently manage these tremendous volumes of data, and use the information to make business decisions, will realize a significant competitive advantage in the marketplace.
Data warehousing, the creation of an enterprise-wide data store, is the first step towards managing these volumes of data. The Data Warehouse is becoming an integral part of many information delivery systems because it provides a single, central location where a reconciled version of data extracted from a wide variety of operational systems is stored. Over the last few years, improvements in price, performance, scalability, and robustness of open computing systems have made data warehousing a central component of Information Technology CIT strategies. Details on methods of data integration and constructing data warehouses can be found in the white paper entitled “Data Integration: The Warehouse Foundation” by Louis Rolleigh and Joe Thomas.
Building a Data Warehouse has its own special challenges (e.g. using common data model, common business dictionary, etc.) and is a complex endeavor. However, just having a Data Warehouse does not provide organizations with the often-heralded business benefits of data warehousing. To complete the supply chain from transactional systems to decision maker, organizations need to deliver systems that allow knowledge workers to make strategic and tactical decisions based on the information stored in these warehouses. These decision support systems are referred to as On-Line Analytical Processing (OLAP) systems. OLAP systems allow knowledge workers to intuitively, quickly, and flexibly manipulate operational data using familiar business terms, in order to provide analytical insight into a particular problem or line of inquiry. For example, by using an OLAP system, decision makers can “slice and dice” information along a customer (or business) dimension, and view business metrics by product and through time. Reports can be defined from multiple perspectives that provide a high-level or detailed view of the performance of any aspect of the business. Decision makers can navigate throughout their database by drilling down on a report to view elements at finer levels of detail, or by pivoting to view reports from different perspectives. To enable such full-functioned business analyses, OLAP systems need to (1) support sophisticated analyses, (2) scale to large numbers of dimensions, and (3) support analyses against large atomic data sets. These three key requirements are discussed further below. Decision makers use key performance metrics to evaluate the operations within their domain, and OLAP systems need to be capable of delivering these metrics in a user-customizable format. These metrics may be obtained from the transactional databases precalculated and stored in the database, or generated on demand during the query process. Commonly used metrics include:
(1) Multidimensional Ratios (e.g. Percent to Total)
“Show me the contribution to weekly sales and category profit made by all items sold in the Northwest stores between July 1 and July 14.”
(2) Comparisons (e.g. Actual vs. Plan, this Period vs. Last Period)
“Show me the sales to plan percentage variation for this year and compare it to that of the previous year to identify planning discrepancies.”
(3) Ranking and Statistical Profiles (e.g. Top N/Bottom N, 70/30, Quartiles) “Show me sales, profit and average call volume per day for my 20 most profitable salespeople, who are in the top 30% of the worldwide sales.”
(4) Custom Consolidations
“Show me an abbreviated income statement by quarter for the last two quarters for my Western Region operations.”
Knowledge workers analyze data from a number of different business perspectives or dimensions. As used hereinafter, a dimension is any element or hierarchical combination of elements in a data model that can be displayed orthogonally with respect to other combinations of elements in the data model. For example, if a report lists sales by week, promotion, store, and department, then the report would be a slice of data taken from a four-dimensional data model.
Target marketing and market segmentation applications involve extracting highly qualified result sets from large volumes of data. For example, a direct marketing organization might want to generate a targeted mailing list based on dozens of characteristics, including purchase frequency, size of the last purchase, past buying trends, customer location, age of customer, and gender of customer. These applications rapidly increase the dimensionality requirements for analysis.
The number of dimensions in OLAP systems range from a few orthogonal dimensions to hundreds of orthogonal dimensions. Orthogonal dimensions in an exemplary OLAP application might include Geography, Time, and Products.
Atomic data refers to the lowest level of data granularity required for effective decision making. In the case of a retail merchandising manager, “atomic data” may refer to information by store, by day, and by item. For a banker, atomic data may be information by account, by transaction, and by branch. Most organizations implementing OLAP systems find themselves needing systems that can scale to tens, hundreds, and even thousands of gigabytes of atomic information.
As OLAP systems become more pervasive and are used by the majority of the enterprise, more data over longer time frames will be included in the data store (i.e. data warehouse), and the size of the database will increase by at least an order of magnitude. Thus, OLAP systems need to be able to scale from present to near-future volumes of data.
In general, OLAP systems need to (1) support the complex analysis requirements of decision-makers, (2) analyze the data from a number of different perspectives (i.e. business dimensions), and (3) support complex analyses against large input (atomic-level) data sets from a Data Warehouse maintained by the organization using a relational database management system (RDBMS).
Vendors of OLAP systems classify OLAP Systems as either Relational OLAP (ROLAP) or Multidimensional OLAP (MOLAP) based on the underlying architecture thereof. Thus, there are two basic architectures for On-Line Analytical Processing systems: The ROLAP Architecture, and the MOLAP architecture.
Overview of the Relational OLAP (ROLAP) System Architecture
The Relational OLAP (ROLAP) system accesses data stored in a Data Warehouse to provide OLAP analyses. The premise of ROLAP is that OLAP capabilities are best provided directly against the relational database, i.e. the Data Warehouse.
The ROLAP architecture was invented to enable direct access of data from Data Warehouses, and therefore support optimization techniques to meet batch window requirements and provide fast response times. Typically, these optimization techniques include application-level table partitioning, pre-aggregate inferencing, denormalization support, and the joining of multiple fact tables.
As shown in
After the data model for the data warehouse is defined, data from on-line transaction-processing (OLTP) systems is loaded into the relational database management system (RDBMS). If required by the data model, database routines are run to pre-aggregate the data within the RDBMS. Indices are then created to optimize query access times. End users submit multidimensional analyses to the ROLAP engine, which then dynamically transforms the requests into SQL execution plans. The SQL execution plans are submitted to the relational database for processing, the relational query results are cross-tabulated, and a multidimensional result data set is returned to the end user. ROLAP is a fully dynamic architecture capable of utilizing precalculated results when they are available, or dynamically generating results from atomic information when necessary.
Overview of MOLAP System Architecture
Multidimensional OLAP (MOLAP) systems utilize a proprietary multidimensional database (MDDB) to provide OLAP analyses. The main premise of this architecture is that data must be stored multidimensionally to be accessed and viewed multidimensionally.
As shown in
Information (i.e. basic data) from a variety of operational systems within an enterprise, comprising the Data Warehouse, is loaded into a prior art multidimensional database (MDDB) through a series of batch routines. The Express™ server by the Oracle Corporation is exemplary of a popular server which can be used to carry out the data loading process in prior art MOLAP systems. As shown in
During an OLAP session, the response time of a multidimensional query on a prior art MDDB depends on how many cells in the MDDB have to be added “on the fly”. As the number of dimensions in the MDDB increases linearly, the number of the cells in the MDDB increases exponentially. However, it is known that the majority of multidimensional queries deal with summarized high level data. Thus, as shown in
As shown in
As shown in
Typically, in MDDB systems, the aggregated data is very sparse, tending to explode as the number of dimension grows and dramatically slowing down the retrieval process (as described in the report entitled “Database Explosion: The OLAP Report”, http://www.olapreport.com/DatabaseExplosion.htm, incorporated herein by reference). Quick and on line retrieval of queried data is critical in delivering on-line response for OLAP queries. Therefore, the data structure of the MDDB, and methods of its storing, indexing and handling are dictated mainly by the need of fast retrieval of massive and sparse data.
Different solutions for this problem are disclosed in the following US patents, each of which is incorporated herein by reference in its entirety:
U.S. Pat. No. 5,822,751 “Efficient Multidimensional Data Aggregation Operator Implementation”
U.S. Pat. No. 5,805,885 “Method And System For Aggregation Objects”
U.S. Pat. No. 5,781,896 “Method And System For Efficiently Performing Database Table Aggregation Using An Aggregation Index”
U.S. Pat. No. 5,745,764 “Method And System For Aggregation Objects”
In all the prior art OLAP servers, the process of storing, indexing and handling MDDB utilize complex data structures to largely improve the retrieval speed, as part of the querying process, at the cost of slowing down the storing and aggregation. The query-bounded structure, that must support fast retrieval of queries in a restricting environment of high sparcity and multi-hierarchies, is not the optimal one for fast aggregation.
In addition to the aggregation process, the Aggregation, Access and Retrieval module is responsible for all data storage, retrieval and access processes. The Logic module is responsible for the execution of OLAP queries. The Presentation module intermediates between the user and the logic module and provides an interface through which the end users view and request OLAP analyses. The client/server architecture allows multiple users to simultaneously access the multidimensional database.
In summary, general system requirements of OLAP systems include: (1) supporting sophisticated analysis, (2) scaling to large number of dimensions, and (3) supporting analysis against large atomic data sets.
MOLAP system architecture is capable of providing analytically sophisticated reports and analysis functionality. However, requirements (2) and (3) fundamentally limit MOLAP's capability, because to be effective and to meet end-user requirements, MOLAP databases need a high degree of aggregation.
By contrast, the ROLAP system architecture allows the construction of systems requiring a low degree of aggregation, but such systems are significantly slower than systems based on MOLAP system architecture principles. The resulting long aggregation times of ROLAP systems impose severe limitations on its volumes and dimensional capabilities.
The graphs plotted in
The large volumes of data and the high dimensionality of certain market segmentation applications are orders of magnitude beyond the limits of current multidimensional databases.
ROLAP is capable of higher data volumes. However, the ROLAP architecture, despite its high volume and dimensionality superiority, suffers from several significant drawbacks as compared to MOLAP:
Full aggregation of large data volumes are very time consuming, otherwise, partial aggregation severely degrades the query response.
It has a slower query response
It requires developers and end users to know SQL
SQL is less capable of the sophisticated analytical functionality necessary for OLAP
ROLAP provides limited application functionality
Thus, improved techniques for data aggregation within MOLAP systems would appear to allow the number of dimensions of and the size of atomic (i.e. basic) data sets in the MDDB to be significantly increased, and thus increase the usage of the MOLAP system architecture.
Also, improved techniques for data aggregation within ROLAP systems would appear to allow for maximized query performance on large data volumes, and reduce the time of partial aggregations that degrades query response, and thus generally benefit ROLAP system architectures.
Thus, there is a great need in the art for an improved way of and means for aggregating data elements within a multi-dimensional database (MDDB), while avoiding the shortcomings and drawbacks of prior art systems and methodologies.
Accordingly, it is a further object of the present invention to provide an improved method of and system for managing data elements within a multidimensional database (MDDB) using a novel stand-alone (i.e. external) data aggregation server, achieving a significant increase in system performance (e.g. deceased access/search time) using a stand-alone scalable data aggregation server.
Another object of the present invention is to provide such system, wherein the stand-alone aggregation server includes an aggregation engine that is integrated with an MDDB, to provide a cartridge-style plug-in accelerator which can communicate with virtually any conventional OLAP server.
Another object of the present invention is to provide such a stand-alone data aggregration server whose computational tasks are restricted to data aggregation, leaving all other OLAP functions to the MOLAP server and therefore complementing OLAP server's functionality.
Another object of the present invention is to provide such a system, wherein the stand-alone aggregation server carries out an improved method of data aggregation within the MDDB which enables the dimensions of the MDDB to be scaled up to large numbers and large atomic (i.e. base) data sets to be handled within the MDDB.
Another object of the present invention is to provide such a stand-alone aggregration server, wherein the aggregation engine supports high-performance aggregation (i.e. data roll-up) processes to maximize query performance of large data volumes, and to reduce the time of partial aggregations that degrades the query response.
Another object of the present invention is to provide such a stand-alone, external scalable aggregation server, wherein its integrated data aggregation (i.e. roll-up) engine speeds up the aggregation process by orders of magnitude, enabling larger database analysis by lowering the aggregation times.
Another object of the present invention is to provide such a novel stand-alone scalable aggregation server for use in OLAP operations, wherein the scalability of the aggregation server enables (i) the speed of the aggregation process carried out therewithin to be substantially increased by distributing the computationally intensive tasks associated with data aggregation among multiple processors, and (ii) the large data sets contained within the MDDB of the aggregation server to be subdivided among multiple processors thus allowing the size of atomic (i.e. basic) data sets within the MDDB to be substantially increased.
Another object of the present invention is to provide such a novel stand-alone scalable aggregation server, which provides for uniform load balancing among processors for high efficiency and best performance, and linear scalability for extending the limits by adding processors.
Another object of the present invention is to provide a stand-alone, external scalable aggregation server, which is suitable for MOLAP as well as for ROLAP system architectures.
Another object of the present invention is to provide a novel stand-alone scalable aggregation server, wherein an MDDB and aggregation engine are integrated and the aggregation engine carries out a high-performance aggregation algorithm and novel storing and searching methods within the MDDB.
Another object of the present invention is to provide a novel stand-alone scalable aggregation server which can be supported on single-processor (i.e. sequential or serial) computing platforms, as well as on multi-processor (i.e. parallel) computing platforms.
Another object of the present invention is to provide a novel stand-alone scalable aggregation server which can be used as a complementary aggregation plug-in to existing MOLAP and ROLAP databases.
Another object of the present invention is to provide a novel stand-alone scalable aggregation server which carries out an novel rollup (i.e. down-up) and spread down (i.e. top-down) aggregation algorithms.
Another object of the present invention is to provide a novel stand-alone scalable aggregation server which includes an integrated MDDB and aggregation engine which carries out full pre-aggregation and/or “on-the-fly” aggregation processes within the MDDB.
Another object of the present invention is to provide such a novel stand-alone scalable aggregation server which is capable of supporting MDDB having a multi-hierarchy dimensionality.
Another object of the present invention is to provide a novel method of aggregating multidimensional data of atomic data sets originating from a RDBMS Data Warehouse.
Another object of the present invention is to provide a novel method of aggregating multidimensional data of atomic data sets originating from other sources, such as external ASCII files, MOLAP server, or other end user applications.
Another object of the present invention is to provide a novel stand-alone scalable data aggregation server which can communicate with any MOLAP server via standard ODBC, OLE DB or DLL interface, in a completely transparent manner with respect to the (client) user, without any time delays in queries, equivalent to storage in MOLAP server's cache.
Another object of the present invention is to provide a novel “cartridge-style” (stand-alone) scalable data aggregation engine which dramatically expands the boundaries of MOLAP into large-scale applications including Banking, Insurance, Retail and Promotion Analysis.
Another object of the present invention is to provide a novel “cartridge-style” (stand-alone) scalable data aggregation engine which dramatically expands the boundaries of high-volatility type ROLAP applications such as, for example, the precalculation of data to maximize query performance.
Another object of the present invention is to provide a generic plug-in cartridge-type data aggregation component, suitable for all MOLAP systems of different vendors, dramatically reducing their aggregation burdens.
Another object of the present invention is to provide a novel high performance cartridge-type data aggregration server which, having standardized interfaces, can be plugged-into the OLAP system of virtually any user or vendor.
Another object of the present invention is to provide a novel “cartridge-style” (stand-alone) scalable data aggregation engine which has the capacity to convert long batch-type data aggregations into interactive sessions.
These and other object of the present invention will become apparent hereinafter and in the Claims to Invention set forth herein.
In order to more fully appreciate the objects of the present invention, the following Detailed Description of the Illustrative Embodiments should be read in conjunction with the accompanying Drawings, wherein:
Referring now to
Through this invention disclosure, the term “aggregation” and “preaggregation” shall be understood to mean the process of summation of numbers, as well as other mathematical operations, such as multiplication, subtraction, division etc.
In general, the stand-alone aggregation server and methods of and apparatus for data aggregation of the present invention can be employed in a wide range of applications, including MOLAP systems, ROLAP systems, Internet URL-directory systems, personalized on-line e-commerce shopping systems, Internet-based systems requiring real-time control of packet routing and/or switching, and the like.
For purposes of illustration, initial focus will be accorded to improvements in MOLAP systems, in which knowledge workers are enabled to intuitively, quickly, and flexibly manipulate operational data within a MDDB using familiar business terms in order to provide analytical insight into a business domain of interest.
a, fast aggregation processing, and fast access to and retrieval of any data element in the MDDB.
As will be described in greater detail hereinafter, the Aggregation Server 603 of the present invention can serve the data aggregation requirements of other types of systems besides OLAP systems such as, for example, URL directory management Data Marts, RDBMS, or ROLAP.
The Aggregation Server 603 of the present invention excels in performing two distinct functions, namely: the aggregation of data in the MDDB; and the handling of the resulting data base in the MDDB, for “on demand” client use. In the case of serving an OLAP system, the Aggregation Server 603 of the present invention focuses on performing these two functions in a high performance manner (i.e. aggregating and storing base data, originated at the Data Warehouse, in a multidimensional storage (MDDB), and providing the results of this data aggregation process ion demand to the clients, such as the OLAP server 605, spreadsheet applications, the end user applications. As such, the Aggregation Server 603 of the present invention frees each conventional OLAP server 605, with which it interfaces, from the need of making data aggregations, and therefore allows the conventional OLAP server 605 to concentrate on the primary functions of OLAP servers, namely: data analysis and supporting a graphical interface with the user client.
During operation, the base data originates at data warehouse or other sources, such as external ASCII files, MOLAP server, or others. The Configuration Manager 613, in order to enable proper communication with all possible sources and data structures, configures two blocks, the Base Data Interface 611 and Data Loader 612. Their configuration is matched with different standards such as OLDB, OLE-DB, ODBC, SQL, API, JDBC, etc.
As shown in
As shown in
An object of the present invention is to make the transfer of data completely transparent to the OLAP user, in a manner which is equivalent to the storing of data in the cache of the OLAP server 605 and without any query delays. This requires that the stand-alone Aggregation Server 603 have exceptionally fast response characteristics. This object is enabled by providing the unique data structure and aggregation mechanism of the present invention.
The function of the aggregation management module is to administrate the aggregation process according to the method illustrated in
In accordance with the principles of the present invention, data aggregation within the stand-alone Aggregation Server 603 can be carried out either as a complete pre-aggregation process, where the base data is fully aggregated before commencing querying, or as a query directed roll-up (QDR) process, where querying is allowed at any stage of aggregation using the “on-the-fly” data aggregation process of the present invention. The QDR process will be described hereinafter in greater detail with reference to
The function of the MDDB Handler (i.e., “management”) module is to handle multidimensional data in the MDDB 625 in a very efficient way, according to the novel method of the present invention, which will be described in detail hereinafter with reference to
The request serving mechanism shown in
The segmented data aggregation method of the present invention is described in FIGS. 9A through 9C2. These figures outline a simplified setting of three dimensions only; however, the following analysis applies to any number of dimensions as well.
The data is being divided into autonomic segments to minimize the amount of simultaneously handled data. The initial aggregation is practiced on a single dimension only, while later on the aggregation process involves all other dimensions.
At the first stage of the aggregation method, an aggregation is performed along dimension 1. The first stage can be performed on more than one dimension. As shown in
In the next stage shown in
The principle of data segmentation can be applied on the first stage as well. However, only a large enough data set will justify such a sliced procedure in the first dimension. Actually, it is possible to consider each segment as an N−1 cube, enabling recursive computation.
It is imperative to get aggregation results of a specific slice before the entire aggregation is completed, or alternatively, to have the roll-up done in a particular sequence. This novel feature of the aggregation method of the present invention is that it allows the querying to begin, even before the regular aggregation process is accomplished, and still having fast response. Moreover, in relational OLAP and other systems requiring only partial aggregations, the QDR process dramatically speeds up the query response.
The QDR process is made feasible by the slice-oriented roll-up method of the present invention. □ After aggregating the first dimension(s), the multidimensional space is composed of independent multidimensional cubes (slices). These cubes can be processed in any arbitrary sequence.
Consequently the aggregation process of the present invention can be monitored by means of files, shared memory sockets, or queues to statically or dynamically set the roll-up order.
In order to satisfy a single query coming from a client, before the required aggregation result has been prepared, the QDR process of the present invention involves performing a fast on-the-fly aggregation (roll-up) involving only a thin slice of the multidimensional data.
A search for a queried data point is then performed by an access to the DIR file. The search along the file can be made using a simple binary search due to file's ascending order. When the record is found, it is then loaded into main memory to search for the required point, characterized by its index INDk. The attached Data field represents the queried value. In case the exact index is not found, it means that the point is a NA.
Notably, when using prior art techniques, multiple handling of data elements, which occurs when a data element is accessed more than once during aggregation process, has been hitherto unavoidable when the main concern is to effectively handle the sparse data. The data structures used in prior art data handling methods have been designed for fast access to a non NA data. According to prior art techniques, each access is associated with a timely search and retrieval in the data structure. For the massive amount of data typically accessed from a Data Warehouse in an OLAP application, such multiple handling of data elements has significantly degraded the efficiency of prior art data aggregation processes. When using prior art data handling techniques, the data element D shown in
In accordance with the data handling method of the present, the data is being pre-ordered for a singular handling, as opposed to multiple handling taught by prior art methods. According to the present invention, elements of base data and their aggregated results are contiguously stored in a way that each element will be accessed only once. This particular order allows a forward-only handling, never backward. Once a base data element is stored, or aggregated result is generated and stored, it is never to be retrieved again for further aggregation. As a result the storage access is minimized. This way of singular handling greatly elevates the aggregation efficiency of large data bases. An efficient handling method as used in the present invention, is shown in
The reason for the central multidimensional database's rise to corporate necessity is that it facilitates flexible, high-performance access and analysis of large volumes of complex and interrelated data.
A stand-alone specialized aggregation server, simultaneously serving many different kinds of clients (e.g. data mart, OLAP, URL, RDBMS), has the power of delivering an enterprise-wide aggregation in a cost-effective way. This kind of server eliminates the roll-up redundancy over the group of clients, delivering scalability and flexibility.
Performance associated with central data warehouse is an important consideration in the overall approach. Performance includes aggregation times and query response.
Effective interactive query applications require near real-time performance, measured in seconds. These application performances translate directly into the aggregation requirements.
In the prior art, in case of MOLAP, a full pre-aggregation must be done before starting querying. In the present invention, in contrast to prior art, the query directed roll-up (QDR) allows instant querying, while the full pre-aggregation is done in the background. In cases a full pre-aggregation is preferred, the currently invented aggregation outperforms any prior art. For the ROLAP and RDBMS clients, partial aggregations maximize query performance. In both cases fast aggregation process is imperative. The aggregation performance of the current invention is by orders of magnitude higher than that of the prior art.
The stand-alone scalable aggregation server of the present invention can be used in any MOLAP system environment for answering questions about corporate performance in a particular market, economic trends, consumer behaviors, weather conditions, population trends, or the state of any physical, social, biological or other system or phenomenon on which different types or categories of information, organizable in accordance with a predetermined dimensional hierarchy, are collected and stored within a RDBMS of one sort or another. Regardless of the particular application selected, the address data mapping processes of the present invention will provide a quick and efficient way of managing a MDDB and also enabling decision support capabilities utilizing the same in diverse application environments.
Functional Advantages Gained by the Data Aggregation Server of the Present Invention
The stand-alone “cartridge-style” plug-in features of the data aggregation server of the present invention, provides freedom in designing an optimized multidimensional data structure and handling method for aggregation, provides freedom in designing a generic aggregation server matching all OLAP vendors, and enables enterprise-wide centralized aggregation.
The method of Segmented Aggregation employed in the aggregation server of the present invention provides flexibility, scalability, a condition for Query Directed Aggregation, and speed improvement.
The method of Multidimensional data organization and indexing employed in the aggregation server of the present invention provides fast storage and retrieval, a condition for Segmented Aggregation, improves the storing, handling, and retrieval of data in a fast manner, and contributes to structural flexibility to allow sliced aggregation and QDR. It also enables the forwarding and single handling of data with improvements in speed performance.
The method of Query Directed Aggregation (QDR) employed in the aggregation server of the present invention minimizes the data handling operations in multi-hierarchy data structures.
The method of Query Directed Aggregation (QDR) employed in the aggregation server of the present invention eliminates the need to wait for full aggregation to be completed, and provides build-up aggregated data required for full aggregation.
It is understood that the System and Method of the illustrative embodiments described hereinabove may be modified in a variety of ways which will become readily apparent to those skilled in the art of having the benefit of the novel teachings disclosed herein. All such modifications and variations of the illustrative embodiments thereof shall be deemed to be within the scope and spirit of the present invention as defined by the Claims to Invention appended hereto.