Data Usage
This is a general discussion of the data usage page. The exact information displayed depends on the model being built. For example, data for an association model has no target displayed.
The wizard displays a list of the possible attributes to be used.
You can include or exclude attributes, and you can change the data
mining type and the data type of attributes.
If the table is multiple record per case (transactional), then you must use
numerical attributes.
The following information may be displayed for each attribute:
- Attribute Count: The number of attributes in the table.
- Selected Attribute Count: The number of attributes selected for inclusion. As you select and deselect attributes, this number changes.
- Case Count or Count: The number of cases in the input table.
- Selected Target: The target attribute that you previously selected. The target attribute is not displayed for algorithms, such as Apriori, where there is no target.
- Case Id Attribute Name: The name of the attribute that you previously selected as case ID
- Missing %: The percent of cases where the value of this attribute is missing.
Attribute Grid
The attribute grid contains all attributes except for the target attribute, which is listed above the grid.
The attributes are displayed in the attributes grid has the following
columns:
- Include: If checked, include the attribute.
- Name: The name of the attribute.
- Data Type: The Java data type of the attribute.
- Mining Type: The default value for mining type is determined as follows:
- If the data type of the column is
NUMBER
or FLOAT
, then the mining type is numerical.
- In all other cases, the mining type is categorical.
- For multi-record case (transactional) data, the my data type of the value column must be
NUMBER
; the mining type can be either numerical or categorical.
- Sparsity: Displayed for models built using the Support Vector Machine and Non-Negative Matrix Factorization algorithms only. A check indicates sparse; no check indicates dense. If an attribute has Data Type one of the valid types for text, Sparsity is automatically checked. In all other case, the user must specify that the data is sparse. For information about sparse data, see Sparsity.
- Count: The number of unique values.
- Missing %: The percent of missing values.
You can include or exclude attributes, change data mining type, and change the data type of certain numerical attributes, as described in Change Attribute Characteristics.
To restore the original mining type defaults for all the attributes in the table, click
Restore.
When you are done, click Next.
The wizard validates that you have selected at least one attribute.
Change Attribute Characteristics
You can make the following changes:
- To change mining type, click the Mining Type column for the attribute. Select the new mining type from the dropdown list.
- If an attribute has Mining Type
numerical
, you can change the Data Type between FLOAT
and NUMBER
. To do this, click on the cell that you want to change, and select the new data type from the dropdown list.
- To exclude an included attribute, click the checkbox in the Include column.
- To include an excluded attribute, click Exclude. For the selected attributes,
the Include column is unchecked.
- If Sparsity is displayed you can change the sparsity by clicking on the checkbox.
Copyright © 2005, Oracle. All rights reserved.