Glossary of Terms |
Word |
Definition |
Algorithm | A specific technique or procedure for producing a data mining model. An algorithm uses a specific model representation and may support one or more functional areas. Examples of algorithms used by ODM include Naive Bayes, Adaptive Bayes Networks, and Support Vector Machine for classification, Support Vector Machine for regression, K-Means and O-Cluster for clustering, MDL for attribute importance, and Apriori for association models. |
Analytic workspace | An analytic workspace stores data in a multidimensional format where it can be manipulated by the OLAP engine. The analytic workspaces are stored in tables in a relational schema. |
Apprehension Rate | Apprehensions as a percentage of number of incidents identified. |
Approximation | Approximation is a data mining
function for predicting continuous target values for new records using a model built from records with known target values. ODM supports the Support Vector Machine algorithm for regression. Approximation is another word for regression. |
Attribute | In the Java interface, an instance of Attribute maps to a column with a name and data type. The attribute corresponds to a column in a database table. When assigned to a column, the column must have a compatible data type; if the data type is not compatible, a runtime exception is likely. Attributes are also called variables, features, data fields, or table columns. |
Attribute importance | A measure of the importance of an attribute in predicting a specified target. The measure of different attributes of a build data table enables users to select the attributes that are found to be most relevant to a mining model. A smaller set of attributes results in a faster model build; the resulting model could be more accurate. ODM uses the minimum description length principle to discover important attributes. Sometimes referred to as feature selection and key fields. |
Binning | Binning, also known as discretization, means grouping related values together, thus reducing the number of distinct values for an attribute. |
Bottom products | Products that had the lowest product margins and sold the least number of units within its respective category. |
Business area | A business area is a set of folders containing related information with a common business purpose. For example, information about all products may be sorted in one business area, whereas information about all customers or employees is stored in another business area. |
Campaign | A grouping of individual promotions that are designed and executed for promoting the sale of one or more items. |
ASK | Available Seat Kilometer. |
Cube |
A cube in a multidimensional data source has the following components: measures and dimensions. The cube contains a measure value for each possible combination of the different dimensions. |
Customer target segment | User-defined customer grouping that is defined by list generation. An example of customer target segment might be "yuppie", where the criteria are specific to age, income, and education. |
CWM (Common Warehouse Model) |
An integration approach for data warehousing, incorporating both technical and business metadata into a single model concentrating on the needs of data warehousing and decision support. |
Data mining |
Data mining enables companies to extract information efficiently from the very largest databases and build integrated business intelligence applications by finding patterns and insights hidden in the data. Data mining allows application developers to quickly automate the extraction and distribution of new business intelligence throughout the organization through the use of predictions, patterns and discoveries. Oracle Data Mining (ODM) supports functionality in Oracle Database 10g
for the following data mining problems: classification, prediction, regression,
clustering, associations, attribute importance, feature extraction and
sequence similarity searches and analysis (BLAST). All model-building,
scoring, and metadata management operations are accessed via the Oracle
Data Mining Client and either a PL/SQL or Java-based API and occur entirely
within the relational database. |
Denormalized data |
Denormalized data is planned redundant data posted from one object to another for performance considerations. |
Dimension |
A dimension is the textual descriptions of the business. Dimensions provide perspective regarding the "whys" and "hows" of the business and element transactions, for example, product, customer, and time dimensions. |
Dimension attribute |
A dimension attribute describes a characteristic that is shared by dimension members. Dimension attributes enable you to select data based on similar characteristics. For example, a Product dimension might have a Color attribute that enables you to search for all red products. |
Dimension hierarchy |
A dimension hierarchy describes a hierarchical relationship between two or more dimension members. Individual dimension members might be related to each other in a hierarchical way. For example, a specific day belongs to a particular month, which in turn is within a particular year. To reflect such relationships, dimension members are organized into dimension hierarchies. A dimension hierarchy is a logical structure that uses ordered levels as a means of organizing and aggregating data. For example, the Time dimension might have a hierarchy to aggregate data from the Month level to the Quarter level to the Year level. A dimension can have more than one hierarchy. For example, as well as the Month-Quarter-Year dimension hierarchy, the Time dimension might also have a Day-Month-Year dimension hierarchy. Note that where multiple dimension hierarchies exist for the same dimension, one dimension hierarchy must be specified as the default hierarchy. |
Dimension measure |
Measures have dimensions that categorize the data in the measure. For example, a Sales measure might have Product, Time, and Geography as its dimensions. When a measure has a particular dimension, the measure is said to be dimensioned by that dimension. For example, Sales is dimensioned by Product. The group of dimensions for a measure constitutes the dimensionality of that measure. For example, the dimensionality of Sales is Product, Time, and Geography. Each element in a dimension is a dimension member. For example, January 2001, February 2001, March 2001, Quarter 1 2001, and the year 2001 are likely members of the Time dimension. |
Drilling | Drilling enables you to view different levels of data by varying the amount of detail. By drilling up or down, you view less or more of the worksheet data. |
Enterprise intelligence | Enterprise intelligence consists of the analysis performed by retailers to effectively manage and plan operations around the various retail lines of business (LOB’s). |
EPM |
Enterprise Performance Management (EPM) is the next generation of business intelligence. A corporate culture embraced by managers at all levels, EPM provides an infrastructure that crosses all disciplines within an organization, including sales, marketing, production, human resources (relating to staffing), and so forth. Reiterative planning, forecasting, and a clear course of corrective action drives performance improvements in all aspects of the business, ultimately leading to better decisions regarding supply chain, customer relationship management CRM), reduction of costs, and so forth. |
ERD | An Entity Relationship Diagram (ERD) is a data modeling tool that assists you with building a graphical representation of your enterprise's data storage and organization needs. ERDs provide a visual representation of how your organization captures its data not only for day-to-day business requirements and processing, but also for reporting and analysis to make the business more profitable. |
ETL (extraction, transformation, and loading) | ETL is the process of obtaining data from one data store or source (extract), modifying it (transform), and inserting it into a different data store (load). |
Fact |
A fact contains a numeric value that measures an aspect of the business. Typical examples are gross sales dollars, total cost, profit, margin dollars, or quantity sold. A fact (or measure) can be additive or partially additive across dimensions. |
Feature | A feature is a combination of attributes in the data that is of special interest and that captures important characteristics of the data. |
Inheritance | Also known as transference, inheritance is the process by which redundant data is posted from one object to another for performance considerations. |
Integrated |
Integrated data is gathered into the data warehouse from a variety of sources and merged into a coherent whole. |
KPIs |
Key Performance Indicators (KPIs) are high-level snapshots of a business or organization based on specific predefined measures. KPIs typically consist of any combination of reports, spreadsheets, or charts. They may include global or regional sales figures and trends over time, or anything else that is deemed critical to a corporation's success. |
Market share | The amount of revenue the company generates from an entire market. Totals sales for a product / total sales for the market of the product. The same calculation can be used for category or classification of merchandise. |
Marketing channel | The specific instance of the media used to advertise the item. For example, if an item is advertised on television, the marketing channel might be NBC. |
Materialized view |
A materialized view, supported by Oracle 8.1.7 Database (or later), contains preaggregated data. Materialized views are snapshot views that are created when you define summaries by using Oracle Discoverer Administrator. Queries are redirected to the materialized views instead of the large detail tables and improve query performance in Discoverer Plus and Discoverer Viewer. Oracle 8.1.7 Database (or later) automatically recognizes when a materialized view can be used to satisfy a query request. Oracle 8.1.7 Database (or later) rewrites the query to use the materialized view. Queries are then directed to the materialized view and not to the underlying detail tables or views. |
Measure |
The name given to the data itself. In OLAP metadata, measures represent data that can be examined and analyzed in crosstabs and graphs. Examples include Sales, Cost, and Profit. |
Media | The mechanism used to execute the promotion, such as television, radio, newspaper, and so forth. |
Merchandise management | Merchandise management, is the methodology employed by a retail business to manage the commodities offered for sale. It includes analysis, planning, acquisition, handling, and control of the merchandise investments for the retail operation. |
Mining model | A mining model is the result of building a model from mining function settings (Java interface) or mining settings table (PL/SQL interface). The representation of the model is specific to the algorithm specified by the user or selected by the DMS. A model can be used for direct inspection, e.g., to examine the rules produced from an ABN model or association models, or to score data. |
Mining result | In the Java interface, the end product(s) of a mining task is the mining result. For example, a build task produces a mining model; a test task produces a test result. |
Missing value | A missing value is a data value that is missing because it was not measured (that is, has a null value), not answered, was unknown, or was lost. Data mining systems vary in the way they treat missing values. There are several typical ways to treat them: ignore then, omit any records containing missing values, replace missing values with the mode or mean, or infer missing values from existing values. ODM ignores missing values during mining operations. |
Model (mining) | An important function of data mining is the production of a model. A model can be descriptive or predictive. A descriptive model helps in understanding underlying processes or behavior. For example, an association model describes consumer behavior. A predictive model is an equation or set of rules that makes it possible to predict an unseen or unmeasured value (the dependent variable or output) from other, known values (independent variables or input). The form of the equation or rules is suggested by mining data collected from the process under study. Some training or estimation technique is used to estimate the parameters of the equation or rules. |
MOLAP | The Oracle MOLAP (Multidimensional Online Analytical Processing) model is based on Cubes.
|
Multi-record case | Each case in the data is stored as multiple records in a table with columns sequenceID, attribute_name, and value. Multi-record case is also known as transactional format. |
Nontransactional format | In a nontransactional format, each case in the data is stored as one record (row) in a table. Nontransactional format is also known as single-record case. |
Normalization | Normalization is the process of eliminating redundant data in your database and ensuring that relationships and dependencies are correctly stated. Typically, when you discuss normalization, you discuss three types: first, second, and third. |
Operational intelligence | Operational intelligence consists of the analysis performed within each functional organization: store management, merchandise management, supply chain management, CRM, and corporate administration. |
Oracle OLAP |
Oracle OLAP is a database option, a service, and it contains several APIs that enable open access to MOLAP data and the analytic features of the OLAP calculation engine. Also see: MOLAP |
Outlier | An outlier is a value that is far outside the normal range in a data set, typically a value that is several standard deviations from the mean. In other words,it is a data value that does not come from the typical population of data --extreme values. In a normal distribution, outliers are typically at least three standard deviations from the mean. |
Parallelism |
Parallelism is the transparent decomposition and simultaneous execution of multiple operations. |
Partition |
A partition is a logical subset of data. In most cases, data warehouses are partitioned by some date field. |
Physical data | In the Java interface, physical data identifies data to be used as input to data mining. Through the use of attribute assignment, attributes of the physical data are mapped to logical attributes of a model’s logical data. The data referenced by a physical data object can be used in model building, model application (scoring), lift computation, statistical analysis, etc. |
Pivoting |
Pivoting enables you to change the order in which columns appear in a table, or interchange items between axes. By pivoting data, you change the way a report is presented in your worksheet. |
Predictor | A predictor is an attribute used as input to a supervised model or algorithm to build a model. |
Prior probability | The set of prior probabilities specifies the distribution of examples of the various classes in data. Also referred to as priors, these could be different from the distribution observed in the data. |
Promotion | A marketing activity planned, developed, and executed to generate sales for products and services. Promotion is the lowest level in the campaign hierarchy. |
Query optimization |
Query optimization is the process by which a database management system decides exactly how a query will execute. |
Relational data source |
A relational data source is a database in which information is stored in a number of database tables. Each database table comprises several columns, and one or more rows. The different tables in a database can be related. Having data in separate but related tables is an efficient way to store and retrieve information. |
Risk management | Risk management is the process of measuring, or assessing financial and operational risk, and then developing strategies to control that risk. |
ROI (return on investment) | Calculation: (Revenue generated – Costs) / Investment. Investment is typically the value of inventory at the acquisition price. |
ROLAP | ROLAP (Relational Online Analytical Processing) is a two-dimensional table where queries are posed and run without the assistance of cubes providing greater flexibility for drilling down, across, and pivoting results. Each row in the table holds data that pertain to some thing or a portion of some thing. Each column of the table contains data regarding an attribute. |
Sales channel | Channel through which a product sale was made. Examples of sales channels include: Internet, resellers, call centers, sales team, storefront, and so forth. |
Score | Scoring data means applying a data mining model to new data to generate predictions. |
Single-record case | In a nontransactional format, each case in the data is stored as one record (row) in a table. Single-record case is also known as nontransactional format. |
Snowflake schema | The snowflake schema is an extended, more normalized star model. A dimension is said to be snowflaked when the low cardinality fields in the dimension have been removed to separate tables and linked back to the original tables with artificial keys. |
Sparse data | In ODM, data is said to be sparse if only a small fraction (no more than 20%, often 3% or less) of the attributes are non-zero or non-null for any given case. Sparse data occurs, for example, in market basket problems. |
Star schema |
A star schema is a central table containing fact data, and multiple tables radiating out from it, connected by the primary and foreign keys of the database. Every star schema design is composed of one table called the fact table, and a set of smaller tables called dimension tables. A star schema has denormalized dimensions. |
Stoplight formatting |
A stoplight format (or traffic light format) enables you to categorize numeric worksheet values as unacceptable, acceptable, and desirable using different colors. The default stoplight format uses the familiar red, yellow, and green color scheme to represent unacceptable, acceptable, and desirable values. (See also conditional formatting .) |
Summary folders | Folders that contain aggregated queried data, created by the Discoverer Administrator, that have been saved for reuse. The data is stored in the database as summary tables and materialized views. |
Transference | Also known as inheritance, transference is the process by which redundant data is posted from one object to another for performance considerations. |
Transformations |
Transformations are PL/SQL functions, procedures, and packages that enable you to change data. |
Value chain | Value chain refers to the full range of activities conducted by a business that increases competitive excellence and shareholder value. The activities facilitating these increases include inbound and outbound logistics (receipt, warehousing, and distribution), actual business operations, marketing and sales, and customer service. |