Text Mining Medline

Database Content

MEDLINE is the National Library of Medicine's premier bibliographic database covering the fields of medicine, nursing, dentistry, veterinary medicine, the health care system, and the preclinical sciences. MEDLINE contains bibliographic citations and author abstracts from more than 4,800 biomedical journals published in the United States and 70 other countries. The database contains over 12 million citations dating back to the mid-1960's. Coverage is worldwide, but most records are from English-language sources or have English abstracts.

The BioOracle MEDLINE Text Mining demo is based on a subset of MEDLINE documents. This demo includes 4318 documents containing the terms 'AR' and 'cancer or neoplasia'. 'AR' is a common gene symbol of the androgen receptor gene (LocusLink ID 367) with a large role in cancer research, as well as a number of other genes and non-gene acronyms.

MeSH is NLM's controlled vocabulary used for indexing articles for MEDLINE/PubMed. MeSH terminology provides a consistent way to retrieve information that may use different terminology for the same concepts. MeSH terms associated with individual MEDLINE documents are used during NMF feature extraction and clustering.