Leveraging Oracle Business Intelligence Tools with the ...



OLAP Option to the Oracle10g Database

Support for large multidimensional data sets and SQL access optimizations

Bud Endress, Oracle Corporation

Anthony Waite, Oracle Corporation

Executive Summary

Oracle9i Release 2 was the first, and is still the only, relational-multidimensional database. As a relational-multidimensional database, it combined a relational engine and relational data types with a full-featured multidimensional engine and multidimensional data types. The concept was simple – retain the advantages of a multidimensional database, solve the problems associated with stand-alone multidimensional databases and leverage the scalable, secure and resilient platform offered by the Oracle Database.

Advantages of multidimensional databases are compelling. They include support for advanced multidimensional calculations and planning functions, a transaction model suitable for what-if analysis and modeling, and a dimensional data model that simplifies the task of defining calculations and expressing queries. Multidimensional databases are also known for excellent query response times.

Although the advantages of multidimensional databases are convincing, there are also numerous problems associated with stand-alone multidimensional databases. As compared with mature relational databases such as Oracle, they lack robust disaster recovery and high availability capabilities. Their security apparatus is immature. They typically replicate data already in the data warehouse. They required specialized query and reporting tools. And so on.

The net result is that stand-alone multidimensional databases are typically an adjunct to the data warehouse rather than being part of the core data warehouse. After all, stand-alone multidimensional databases replicate small subsets of the data warehouse and they can't support the SQL based tools used to query the data warehouse. In order to be considered part of the data warehouse, multidimensional databases would need to provide support for very large data sets and query by SQL based tools.

The OLAP option to the Oracle9i Release 2 Database completely changed the multidimensional database market. It is not a stand-alone multidimensional database. Instead, a multidimensional engine and multidimensional data types were quite literally introduced into the kernel of the Oracle Database. The multidimensional technology shares the same platform as the relational technology. It is highly secure. It has mature high availability and disaster recovery features. It supports SQL as an interface to multidimensional data types.

Oracle OLAP multidimensional data is a first class data type in the Oracle Database. It is managed by the Oracle database management system, stored in Oracle data files and is accessible by SQL. As a result of this native integration of the multidimensional data type, it is no longer necessary to host the OLAP data in a stand-alone multidimensional database and OLAP data can now be considered an integral part of the data warehouse.

The OLAP option to the Oracle10g database focuses on the primarily issues affecting the status of multidimensional data types within the data warehouse: scalability and the ability to support SQL as a query language. Oracle10g OLAP also offers new analytic opportunities and includes enhancements to the OLAP API.

Oracle10g provides support for features such as partitioned multidimensional data types, and parallelism within the cube building and aggregation process. The SQL interface has been significantly enhanced to optimize support for a broader use of the SQL language when querying multidimensional data types. The result is the ability to efficiently manage very large multidimensional data sets and to support a wide range of SQL based tools and applications.

Audience

This paper discusses a subset of the new features and enhancement that are included in the OLAP option to the Oracle10g Database. The intended audience is database administrators and developers who are familiar with either the OLAP option the to the Oracle9i Database or its predecessor product, Oracle Express Server. This paper might also be useful to organizations that are considering the use of the OLAP option since it provides insight into future directions for the product.

A Brief Overview of the OLAP Option

If you are not familiar with the OLAP option, this section will provide a brief overview of its capabilities and architecture.

OLAP API

The OLAP API is an object-oriented Java application programming interface. It provides a multidimensional object model, and a broad range of classes and methods, that allow application to easily select, navigate and calculate multidimensional data. The OLAP API, being a very powerful but low level API, is primarily targeted to the ISV community. It is occasionally used by IT organizations, however they usually develop to higher level interfaces such as those provided by the Oracle Business Intelligence Beans.

Multidimensional Engine and Data Types

The multidimensional engine and data types provide support for complex, multidimensional calculations and planning functions. The multidimensional engine provides support for a wide range of functions such as non-additive aggregation methods, time series calculations, indices, statistical functions, and many other analytic functions. It offers exception support for planning applications that require features such as forecasting and allocations. Also in support of planning applications, the multidimensional engine supports a 'read-repeatable' transaction model. This transaction model allows multiple users to simultaneously engage in what-if analysis sessions where they can make session level changes to both the data and the data dictionary.

The multidimensional engine utilizes array based data structures known as variables for data storage. These variables are true multidimensional data types stored in Oracle data files. They are very efficient in terms of data storage and query performance.

The multidimensional engine provides a dimensionally aware calcuation language known as the OLAP DML. This is a procedural programming language that can be used to express various types of calculations, design custom analytic functions, and control the data loading and calculation processes related to multidimensional data types. The OLAP DML is accessible through SQL and PL/SQL, as well as the OLAP Worksheet client tool

A collection of multidimensional data types and OLAP DML programming code is stored in an analytic workspace within the database. An analytic workspace has two basic purposes. First, it is a container for a collection of multidimensional data types within a schema. Second, it plays a role in defining the scope of a read-repeatable transaction (or, in other words, the boundaries of a what-if session).

SQL Interface

The multidimensional engine and the object technology of the Database support a SQL interface to multidimensional data types. At the core of the SQL interface to multidimensional data types is OLAP_TABLE, a table function. The role of OLAP_TABLE is to pass SQL to the multidimensional engine, transform parts of a SQL statement to OLAP DML commands and return data from the multidimensional engine as a row set to the relational engine.

The SQL interface to multidimensional data types can be made transparent to the SQL application by creating a view that selected from OLAP_TABLE. While certain applications might prefer specific style views, views that make the analytic workspace appear as a star schema to a SQL application are common. Also common are views that combine both dimension data (members, descriptions, hierarchical and attribute data) and fact data in a single denormalized view. OLAP_TABLE can also be selected from without using views, thus providing applications with the opportunity to interact with the multidimensional engine from within a select statement.

The following example illustrates how a fact view of a star schema might be created and queried. First, an abstract data types are created and then the view is defined. The abstract data types define the columns and data types to the relational engine. The CREATE VIEW statement binds the abstract data type to the analytic workspace and maps columns to multidimensional data types.

Oracle10g

create type sales_type_row as object (

time_id varchar2(5),

channel_id varchar2(5),

product_id varchar2(5),

customer_id varchar2(5),

sales number,

units number,

extended_cost number,

forecast_sales number,

olap_calc raw(32)

);

create type sales_type_table as table of sales_type_row;

create or replace view sales_view as

select *

from table(OLAP_TABLE('global DURATION session',

'sales_type_table',

'',

'DIMENSION time_id FROM time

DIMENSION channel_id FROM channel

DIMENSION product_id FROM product

DIMENSION customer_id FROM customer

MEASURE sales FROM sales

MEASURE units FROM units

MEASURE extended_cost FROM extended_cost

MEASURE forecast_sales FROM fcast_sales

ROW2CELL olap_calc'));

The structure of a SELECT statement that selects from OLAP_TABLE is relatively simple and tends to be similar across different analytic workspaces and for use by different types of applications. There are arguments to OLAP_TABLE that define the analytic workspace (GLOBAL, in this case), that bind the query to an abstract data type and that map relational columns define in the abstract data types to multidimensional data types in the analytic workspace.

The view could then be queried with a SELECT statement such as:

select time_id,

channel_id,

product_id,

customer_id,

sales

from sales_view

where time_id = '2003'

and channel_id = 'CATALOG'

and product_id in ('GUNS','LIPSTICK')

and customer_id = 'TEXAS';

There are several methods that applications can use to interact with the multidimensional engine to define calculations and perform other tasks. One method is the inclusion of an OLAP DML expression in a select statement. The following example includes time series and a market share calculations.

select product_id,

time_id,

sales,

olap_expression

(olap_calc,'lagdif(sales,1,time,status)')

as SALES_CHG_PRIOR_PRIOD,

olap_expression

(olap_calc,'sales/sales(product ''1'') * 100')

as PRODUCT_SHARE

from sales_olap_view

where time_id = '2003'

and channel_id = 'CATALOG'

and product_id in ('GUNS','LIPSTICK')

and customer_id = 'TEXAS';

In this example, the actual OLAP DML expression is in underlined text. Note the relative simplicity of the code – the dimensional data model makes this possible. The remainder of the bold text is the wrapper for the OLAP DML code that allows its use within the SQL select statement.

New Features and Enhancements in Oracle10g

This document describes many, but not all, of the enhancements that are included in the OLAP Option to the Oracle10g Database. This document focuses on those features that enhance the standing of multidimensional data types as an integral part of the data warehouse: support for very large multidimensional data sets and enhancements to the SQL interface to multidimensional data types.

The main focus of this document are those features that enhance the OLAP option and its multidimensional data types standing as a part of the data warehouse: support for very large multidimensional data sets and that enhance the SQL interface to multidimensional data types.

Support for Very Large Multidimensional Data Sets

Oracle10g brings the time-tested techniques of partitioning and parallelism to multidimensional data sets. The collection of features that support partitioning and parallelism allow for more efficient utilization of hardware resource and more efficient management of warehouses with large dimensional data sets. When reading about these features note how the OLAP option leverages the Oracle Database as a platform. Parallel update, for example, leverages new features in the multidimensional engine as well as support for parallelization within the Database. This is an excellent example of how the multidimensional engine benefits from being an integrated part of the Oracle Database.

In addition to partitioning and parallelism the multidimensional engine has extended its ability to efficiently perform complex and numerous calculations dynamically, thus eliminating the need to pre-calculate and store large volumes of data. As compared with relational technology and competing multidimensional technologies, the ability to efficiently perform dynamic calculations allows the multidimensional engine to present large volumes of derived information from relatively little stored data. This trend continues with new features such as the ability to aggregate data from formulas to summary levels within a hierarchy.

Enhanced Storage Model

Before examining individual features, it is necessary to examine a significant change to the storage model of the analytic workspace. While the benefits of the storage model itself might not be readily apparent, it is important to understand it in order to understand how other new features work.

In Oracle9i Release2, analytic workspaces are stored in AW$ tables. The Oracle9i Release 2 AW$ table contains two columns, an EXTNUM and AWLOB. The analytic workspace could be partitioned across multiple rows in the AW$ table by specifying a maximize segment size (the maximize amount of data for any particular row). The AW$ table itself could be partitioned using standard relational partitioning features.

This feature was required to support large analytic workspaces since there is a size limit for each row of a BLOB data type. When used in combination with table partitioning, it was also useful for reducing I/O bottlenecks. The extent of the database administrators control over this form of partitioning was, however, limited to specifying the maximum segment size; the multidimensional engine automatically distributed data across the rows of the AW$ table.

For example, if there is an analytic workspace named SALES with a segment size of 20GB, each row in the AW$ table would contain a maximum of 20GB of data. The AW$SALES table could be partitioned using table partitioning as shown in the following illustration.

[pic]

AW$ table partitioning in Oracle9i Release 2

In Oracle10g, the storage model is enhanced to support the placement of objects in the analytic workspaces into specific rows of the AW$ table. Objects can be further partitioned by segment size to allow for large objects. Like Oracle9i, the AW$ table can then be partitioned across multiple data files.

[pic]

AW$ table partitioning in Oracle10g Release 2

The obvious benefit of the enhanced storage model is that database administrators have complete control over how data is distributed across data files and can therefore optimize I/O for data and data access patterns. Other benefits of the enhanced storage model will become apparent as other specific features are discussed.

Partitioned Variables

Using application-programming techniques, it has long been possible to build analytic workspaces where data is partitioned across elements of the data model or across dimension members. Variables, for example, could be partitioned by level of summarization or by members in the time dimension. These techniques were effective, but they required an investment in application programming code and could not be fully leveraged by the multidimensional engine for parallelism.

In Oracle10g, the multidimensional engine provides direct support for partitioned variables. This support for partitioning presents many opportunities for both enhancing manageability and supporting large multidimensional data sets.

Three partitioning methods are supported:

• Range partitioning allows data to be partitioned based on a range of dimension members. For example, one partition might contain time dimension members that are less than '13', another that are less than '25', and so on.

• List partitioning allows data to be partitioned based on a list of specific dimension members. For example, a partition might contain dimension members

and other partition might contain members ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download