Oracle Text Mining

Demo Application Setup Instructions version 2.1, 18 Jan 2006

This demo has been developed on Windows and tested on Linux. Although the Oracle installation package includes a working perl driver on Windows, Linux/Unix installations may require recompilation of the perl DBD-Oracle database driver to make it compatible with Oracle 10g. Please refer the the appropriate README file in the distribution package for platform specific actions.

1. Set up an Oracle Text User

Execute the script 'user_setup.sql' from directory User in sqlplus as system administrator.

cd setup/User sqlplus system/admin_password @user_setup.sql


2. Load and Index MEDLINE documents

2.1 Create database table for MEDLINE

Execute the script 'create_table.sql' from directory MEDLINE in sqlplus as user MEDLINE.

cd ../MEDLINE sqlplus text/miner @create_table.sql

2.2 Load data with SQL*Loader

If not using provided data file, you can convert your own MEDLINE XML file to the format used by SQL*Loader and rename it to medline.dat for loading. Otherwise use the provided medline.dat file.

perl parse_XML.pl MEDLINE_file_name.xml

Load with SQL*Loader

sqlldr parfile=medline.par readsize=10000000 bindsize=10000000 direct=t

2.3 Index MEDLINE for Oracle Text Searching

sqlplus text/miner @index.sql


3. Setup Oracle Text Demo Objects

cd ../Objects sqlplus text/miner @make_objects.sql


4. Generate Thesauri

Thesauri are optional, but needed for term co-occurence results and useful for search term expansion. The Gene Ontology (GO) thesaurus can take as long as an hour to load. The NCI Thesaurus loads in 20 to 30 minutes. The transcription factor (TF) thesaurus is fast. All thesauri are mirrored in the stretchviewer java applet accesses from the demo start page to allow navigation of the hierarchy tree and search term selection.

cd ../Thesauri cd TFT sqlplus text/miner @setup.sql cd ../VG sqlplus text/miner @VG_thes.sql cd ../FSector sqlplus text/miner @make_thes.sql cd ../MeSH sqlplus text/miner @make_thes.sql cd ../NCIT sqlplus text/miner @setup.sql cd ../GO sqlplus text/miner @build_thes.sql


5. Install Apache HTTP Server

Get appropriate installation package from Apache.org

Install and start Apache server.

6. Copy demo files to correct Apache location

The demo assumes that Apache was installed in directory "C:\Program Files\Apache Group\Apache2". It is needed for creation of stretchviewer data files. If Apache is installed in a different location change the $TM::APACHE_ROOT value in the TM.pm module in the demo package (located in the Apache\cgi-bin\TexMining directory). Remember to use double \'s in the path on Windows.

# Apache server root directory $TM::APACHE_ROOT = "C:\\Program Files\\Apache Group\\Apache2";

Next, copy demo files from the package directories (Apache\htdocs and Apache\cgi-bin) to the equivalent Apache server directories.
From the demo's Apache htdocs folder, copy the following directories into the Apache Server htdocs folder:
  • TextMining
  • css
  • applet
You must copy the stretchviewer applet jar file from $ORACLE_HOME/ctx/sample/stretch/stretchviewer.jar to the Apache htdocs/applet directory.

From the demo's Apache cgi-bin folder, copy the following directories into the Apache Server cgi-bin folder:
  • TextMining
The demo uses the Oracle perl distribution and all cgi scripts (in the cgi-bin\TextMining directory) point to the "C:\oracle\product\10.2.0\db_1\perl\5.8.3\bin\MSWin32-x86-multi-thread\perl" executable. All headers need to be changed if a different perl executable is used.

7. Set database connection parameters

The database connection is set in the OTU.pm module (located in the Apache\cgi-bin\TexMining directory). The user, password, database name (SID), port and server are set here. Make sure to change the default database name (ora10g) to the name of your database. Also, set the $ORACLE_ROOT value ("C:\oracle\product\10.2.0\db_1") to the correct location.

8. Setup Cytoscape co-occurence graph visualization

A. Installation of Cytoscape and text mining demo components
  1. Install cytoscape from http://cytoscape.org
  2. Install Oracle plugin from http://www.oracle.com/technology/industries/life_sciences/ls_sample_code.html
    Note: Plugin is only needed to store graph in the Oracle database as a Network Data Model. Co-occurence results from the demo are saved in cytoscape format files for direct loading into the viewer.
  3. Replace vizmap.props file in the cytoscape root directory with text mining demo version file OR append contents of vizmap.props.TM to your existing vizmap.props file to retain any existing personalizations.
Consult the README.txt to learn how to use Cytoscape to visualize text mining demo co-occurence graph.

9. Run demo

Open browser and type http://localhost/cgi-bin/TextMining/TM.pl into the navigation window. This allow you to select the data source defined in the SRC.pm file. Each data source needs a TM.[source].pm and OTU.[source].pm file defining the table and text index information, as well as any existing sections for use in searching, data mining and document display. To use the currently pre-selected data source just start the demo from this location: http://localhost/cgi-bin/TextMining/search.pl

Begin text mining! Here's an outline of what you can do: . A User Guide is also available.

An e-seminar demonstrating the features of this demo can be found on the Life Science OTN web site. Note: Requires MS Internet Explorer.

A demo workflow is available in PDF format.

Questions? Email Raf Podowski.