Rem
Rem $Header: README.txt 20-oct-2003.14:30:14 ramkrish Exp $
Rem
Rem demo/data/README.txt
Rem
Rem Copyright (c) 2003, Oracle Corporation.  All rights reserved.  
Rem
Rem    NAME
Rem      README.txt - README File describing data used for ODM Sample Code
Rem
Rem    DESCRIPTION
Rem      This file describes the data used for sample code for
Rem      the PL/SQL and Java APIs for Oracle Data Mining.
Rem
Rem    NOTES
Rem      A good reference on understanding and preparing mining data is:
Rem      "Data Preparation for Data Mining" by Dorian Pyle.
Rem     
Rem
Rem    MODIFIED   (MM/DD/YY) 
Rem    ramkrish    10/20/03 - ramkrish_txn109085
Rem    ramkrish    10/02/03 - Creation

--------
OVERVIEW
--------

There are essentially two groups of datasets for ODM samples:

1. Sample Schema - this constitutes mining data embedded in the data
   provided with the RDBMS in the SH Schema.
   Sample programs written against this dataset are:
   demo/sample/plsql/*_sh.sql
   demo/sample/java/*_sh.java

2. Individual datasets - such as DRUG_DEPOT
   Sample programs written against this dataset are:
   demo/sample/plsql/*demo.sql
   demo/sample/java/*demo.java

-------------------------------------------------------------------------------
                     DESCRIPTION OF SAMPLE SCHEMA DATASET
-------------------------------------------------------------------------------

As a prerequisite, the Sample Schema (SH Schema) must be
installed in your RDBMS instance as part of the Oracle
installation.

The demo programs based on the Sample (SH) schema work on
the following *views* as data inputs:

- mining_data_build_v
- mining_data_test_v
- mining_data_apply_v
- market_basket_v

which are defined against one or more tables located in
the SH schema. The following tables are also created to
be used in Text Mining demos:

- mining_build_nested_text
- mining_test_nested_text
- mining_apply_nested_text

To make these views available in your ODM installation:

1. The user who has read privileges on the SH Schema
   (for e.g., your DBA) should run the script
   dm/admin/dmshgrant.sql (See ODM Admin Guide for details).
2. You must run the script dm/admin/dmsh.sql in your
   schema to create these views.

The schema for MINING_DATA_BUILD_V, MINING_DATA_TEST_V,
MINING_DATA_APPLY_V is shown below:

Name			Null?    Type       
----------------------- -------- -----------
CUST_ID                 NOT NULL NUMBER
CUST_GENDER		NOT NULL CHAR(1)
AGE                              NUMBER
CUST_MARITAL_STATUS              VARCHAR2(20)
COUNTRY_NAME            NOT NULL VARCHAR2(40)
CUST_INCOME_LEVEL                VARCHAR2(30)
EDUCATION                        VARCHAR2(21)
OCCUPATION                       VARCHAR2(21)
HOUSEHOLD_SIZE                   VARCHAR2(21)
YRS_RESIDENCE                    NUMBER
AFFINITY_CARD                    NUMBER(10)
BULK_PACK_DISKETTES              NUMBER(10)
FLAT_PANEL_MONITOR               NUMBER(10)
HOME_THEATER_PACKAGE             NUMBER(10)
BOOKKEEPING_APPLICATION          NUMBER(10)
PRINTER_SUPPLIES                 NUMBER(10)
Y_BOX_GAMES                      NUMBER(10)
OS_DOC_SET_KANJI                 NUMBER(10)

The schema for MARKET_BASKET_V is shown below:

Name                         Null?    Type
---------------------------- -------- -------------------
CUST_ID	                     NOT NULL NUMBER
EXTENSION_CABLE                       NUMBER
FLAT_PANEL_MONITOR                    NUMBER
CD_RW_HIGH_SPEED_5_PACK               NUMBER
ENVOY_256MB_40GB                      NUMBER
ENVOY_AMBASSADOR                      NUMBER
EXTERNAL_8X_CD_ROM                    NUMBER
KEYBOARD_WRIST_REST                   NUMBER
SM26273_BLACK_INK_CARTRIDGE           NUMBER
MOUSE_PAD                             NUMBER
MULTIMEDIA_SPEAKERS_3INCH             NUMBER
OS_DOC_SET_ENGLISH                    NUMBER
SIMM_16MB_PCMCIAII_CARD               NUMBER
STANDARD_MOUSE                        NUMBER

----------------------------
-- Attribute Characteristics
--

LEGEND: CAT  => CATegorical
        NUM  => NUMerical
        PRED => PREDictor
        TGT  => TarGeT
        Card => Cardinality of values
        Sprs -> SPaRSity of values

For MINING_DATA_BUILD_V:
------------------------

* AGE is a target for SVM Regression sample

Name			Type Use  Count Card Range Nulls Min Max Mean Mode StD
----------------------- ---- ---- ----- ---- ----- ----- --- --- ---- ---- ---
CUST_ID                 N/A  id
CUST_GENDER		CAT  PRED
AGE                     NUM  PRED*
CUST_MARITAL_STATUS     CAT  PRED
COUNTRY_NAME            CAT  PRED
CUST_INCOME_LEVEL       CAT  PRED
EDUCATION               CAT  PRED
OCCUPATION              CAT  PRED
HOUSEHOLD_SIZE          CAT  PRED
YRS_RESIDENCE           NUM  PRED
AFFINITY_CARD           NUM  TGT
BULK_PACK_DISKETTES     NUM  PRED
FLAT_PANEL_MONITOR      NUM  PRED
HOME_THEATER_PACKAGE    NUM  PRED
BOOKKEEPING_APPLICATION NUM  PRED
PRINTER_SUPPLIES        NUM  PRED
Y_BOX_GAMES             NUM  PRED
OS_DOC_SET_KANJI        NUM  PRED

For MINING_DATA_TEST_V:
-----------------------

For MINING_DATA_APPLY_V:
------------------------

For MARKET_BASKET_V:
--------------------


-------------------------------------------------------------------------------
                     DESCRIPTION OF INDIVIDUAL DATASETS
-------------------------------------------------------------------------------

------
CENSUS
------


-------------
BREAST_CANCER
-------------
