k-Means Algorithm
The k-Means algorithm is a distance-based clustering algorithm that
partitions the data into a predetermined number of clusters.
The k-Means algorithm works only with both categorical and numerical attributes.
Distance-based algorithms rely on a distance metric (function) to measure
the similarity ("closeness") between data points.
Algorithms Details
ODM implements an enhanced version of the k-Means algorithm; the ODM implementation has the following features:
- The algorithm builds models in a hierarchical manner. The algorithm builds a model top down using binary splits and refinement of all nodes at the end. The whole tree is returned.
- The algorithm grows the tree one node at a time (unbalanced approach). Based on a user setting, the node with the largest or largest variance is split to increase the size of the tree until the desired number of clusters is reached.
- The algorithm provides probabilistic scoring and assignment of data to clusters.
- The algorithm returns, for each cluster, a centroid (cluster prototype), histograms (one for each attribute), and a rule describing the hyperbox that encloses the majority of the data assigned to the cluster. The centroid reports the mode for categorical attributes or the mean and variance for numerical attributes.
For more information, including how to set configuration parameters, see the
ODM documentation in Where to Find More Information.
Copyright © 2005, Oracle. All rights reserved.