Using the Sparsity Advisor

Data can be stored in several different forms in an analytic workspace, depending on whether it is dense, sparse, or very sparse. The Sparsity Advisor is a group of subprograms in DBMS_AW that you can use to analyze the relational source data and get recommendations for storing it in an analytic workspace.

Data Storage Options in Analytic Workspaces

Analytic workspaces analyze and manipulate data in a multidimensional format that allocates one cell for each combination of dimension members. The cell can contain a data value, or it can contain an NA (null). Regardless of its content, the cell size is defined by the data type, for example, every cell in a DECIMAL variable is 8 bytes.

Variables can be either dense (they contain 30% or more cells with data values) or sparse (less than 30% data values). Most variables are sparse and many are extremely sparse.

Although data can also be stored in the multidimensional format used for analysis, other methods are available for storing sparse variables that make more efficient use of disk space and improve performance. Sparse data can be stored in a variable defined with a composite dimension. A composite has as its members the dimension-value combinations (called tuples) for which there is data. When a data value is added to a variable dimensioned by a composite, that action triggers the creation of a composite tuple. A composite is an index into one or more sparse data variables, and is used to store sparse data in a compact form. Very sparse data can be stored in a variable defined with a compressed composite, which uses a different algorithm for data storage from regular composites.

Selecting the Best Data Storage Method

In contrast to dimensional data, relational data is stored in tables in a very compact format, with rows only for actual data values. When designing an analytic workspace, you may have difficulty manually identifying sparsity in the source data and determining the best storage method. The Sparsity Advisor analyzes the source data in relational tables and recommends a storage method. The recommendations may include the definition of a composite and partitioning of the data variable.

The Sparsity Advisor consists of these procedures and functions:


SPARSITY_ADVICE_TABLE Procedure
ADD_DIMENSION_SOURCE Procedure
ADVISE_SPARSITY Procedure
ADVISE_DIMENSIONALITY Function
ADVISE_DIMENSIONALITY Procedure

The Sparsity Advisor also provides a public table type for storing information about the dimensions of the facts being analyzed. Three objects are used to define the table type:


DBMS_AW$_COLUMNLIST_T
DBMS_AW$_DIMENSION_SOURCE_T
DBMS_AW$_DIMENSION_SOURCES_T

The following SQL DESCRIBE statements show the object definitions.

DESCRIBE dbms_aw$_columnlist_t
dbms_aw$_columnlist_t TABLE OF VARCHAR2(100)
 
DESCRIBE dbms_aw$_dimension_source_t
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 DIMNAME                                            VARCHAR2(100)
 COLUMNNAME                                         VARCHAR2(100)
 SOURCEVALUE                                        VARCHAR2(32767)
 DIMTYPE                                            NUMBER(3)
 HIERCOLS                                           DBMS_AW$_COLUMNLIST_T
 PARTBY                                             NUMBER(9)

DESCRIBE dbms_aw$_dimension_sources_t
dbms_aw$_dimension_sources_t TABLE OF DBMS_AW$_DIMENSION_SOURCE_T

Using the Sparsity Advisor

Take these steps to use the Sparsity Advisor:

  1. Call SPARSITY_ADVICE_TABLE to create a table for storing the evaluation of the Sparsity Advisor.

  2. Call ADD_DIMENSION_SOURCE for each dimension related by one or more columns to the fact table being evaluated.

    The information that you provide about these dimensions is stored in a DBMS_AW$_DIMENSION_SOURCES_T variable.

  3. Call ADVISE_SPARSITY to evaluate the fact table.

    Its recommendations are stored in the table created by SPARSITY_ADVICE_TABLE. You can use these recommendations to make your own judgements about defining variables in your analytic workspace, or you can continue with the following step.

  4. Call the ADVISE_DIMENSIONALITY procedure to get the OLAP DML object definitions for the recommended composite, partitioning, and variable definitions.

    or

    Use the ADVISE_DIMENSIONALITY function to get the OLAP DML object definition for the recommended composite and the dimension order for the variable definitions for a specific partition.

Example: Evaluating Sparsity in the GLOBAL Schema

Example B-1, "Sparsity Advisor Script for GLOBAL" provides a SQL script for evaluating the sparsity of the UNITS_HISTORY_FACT table in the GLOBAL schema. In the GLOBAL analytic workspace, UNITS_HISTORY_FACT defines the Units Cube and will be the source for the UNITS variable. UNITS_HISTORY_FACT is a fact table with a primary key composed of foreign keys from four dimension tables. A fifth column contains the facts for Unit Sales.

The CHANNEL_DIM and CUSTOMER_DIM tables contain all of the information for the Channel and Customer dimensions in a basic star configuration. Three tables in a snowflake configuration provide data for the Time dimension: MONTH_DIM, QUARTER_DIM, and YEAR_DIM. The PRODUCT_CHILD_PARENT table is a parent-child table and defines the Product dimension.

Example B-1 Sparsity Advisor Script for GLOBAL

CONNECT global/global
SET ECHO ON
SET LINESIZE 300
SET PAGESIZE 300
SET SERVEROUT ON FORMAT WRAPPED
 
-- Define and initialize an advice table named AW_SPARSITY_ADVICE
BEGIN
     dbms_aw.sparsity_advice_table();
EXCEPTION
     WHEN OTHERS THEN NULL;
END;
/
 
TRUNCATE TABLE aw_sparsity_advice;
 
DECLARE
     dimsources dbms_aw$_dimension_sources_t;
     dimlist VARCHAR2(500);
     sparsedim VARCHAR2(500);
     defs CLOB;
BEGIN
-- Provide information about all dimensions in the cube
     dbms_aw.add_dimension_source('channel', 'channel_id', dimsources, 
         'channel_dim', dbms_aw.hier_levels,
          dbms_aw$_columnlist_t('channel_id', 'total_channel_id'));
     dbms_aw.add_dimension_source('product', 'item_id', dimsources, 
          'product_child_parent', dbms_aw.hier_parentchild, 
           dbms_aw$_columnlist_t('product_id', 'parent_id'));
     dbms_aw.add_dimension_source('customer', 'ship_to_id', dimsources,
         'customer_dim', dbms_aw.hier_levels,
          dbms_aw$_columnlist_t('ship_to_id', 'warehouse_id', 'region_id',
               'total_customer_id'));
     dbms_aw.add_dimension_source('time', 'month_id', dimsources,
           'SELECT m.month_id, q.quarter_id, y.year_id
                FROM time_month_dim m, time_quarter_dim q, time_year_dim y
                WHERE m.parent=q.quarter_id AND q.parent=y.year_id',
            dbms_aw.hier_levels, 
            dbms_aw$_columnlist_t('month_id', 'quarter_id', 'year_id'));
 
-- Analyze fact table and provide advice without partitioning
     dbms_aw.advise_sparsity('units_history_fact', 'units_cube', 
          dimsources, dbms_aw.advice_default, dbms_aw.partby_none);
 
COMMIT;
 
-- Generate OLAP DML for composite and variable definitions
dimlist := dbms_aw.advise_dimensionality('units_cube', sparsedim, 
           'units_cube_composite');
dbms_output.put_line('Dimension list:  ' || dimlist);
dbms_output.put_line('Sparse dimension:  ' || sparsedim);
dbms_aw.advise_dimensionality(defs, 'units_cube');
dbms_output.put_line('Definitions:  ');
dbms_aw.printlog(defs);
 
END;
/

Advice from Sample Program

The script in Example B-1, "Sparsity Advisor Script for GLOBAL" generates the following information.

Dimension list:  <channel units_cube_composite<product customer time>>
Sparse dimension:  DEFINE units_cube_composite COMPOSITE <product customer time>
Definitions:
DEFINE units_cube.cp COMPOSITE <product customer time>
DEFINE units_cube NUMBER VARIABLE <channel units_cube.cp<product customer time>>
PL/SQL procedure successfully completed.

Information Stored in AW_SPARSITY_ADVICE Table

This SQL SELECT statement shows some of the columns from the AW_SPARSITY_ADVICE table, which is the basis for the recommended OLAP DML object definitions.

SELECT fact, dimension, dimcolumn, membercount nmem, leafcount nleaf, 
     advice, density
     FROM aw_sparsity_advice
     WHERE cubename='units_cube';
 

This query returns the following result set:

FACT                 DIMENSION    DIMCOLUMN     NMEM  NLEAF ADVICE        DENSITY
-------------------- ------------ ------------ ----- ------ ------------ --------
units_history_fact   channel      channel_id       3      3 DENSE          .46182
units_history_fact   product      item_id         48     36 SPARSE         .94827
units_history_fact   customer     ship_to_id      61     61 SPARSE         .97031
units_history_fact   time         month_id        96     79 SPARSE         .97664