Non-negative Matrix Factorization (NMF) if an ODM feature extraction method. Documents can be clustered based on how well they are represented by a specific feature.
In this demo, either document themes or MeSH terms are used as document characteristics. For themes, the document vs. MeSH term data matrix consists of theme weights, while for MeSH terms it is a binary matrix. The NMF alogorithm is applied to this matrix. The user specifies the number of features to generate. Each document is then scored against each feature, generating a feature vs. document matrix. Documents are then clustered with the ODM k-Mean clustering algorithm, generating a specified number of hierarchical clusters. It is possible to generate a full hierarchy tree with one document per cluster if the number of requested clusters is equal to the number of documents. Results are displayed in the form of a heatmap, ordering the documents by cluster and cluster score. The heatmap palette used blue for minimum and yellow for maximum.
In order to clarify the contextual meaning behind each cluster, the themes and gist of each cluster can be displayed by clicking on the cluster number button. Additionally, the top ten themes or MeSH terms for each cluster are presented, showing the number of documents with the terms and the average weight. The importance of each term for the NMF features is presented as a term vs. feature heatmap, sorted by relative weight of each term.