Oracle Text Mining
Execute the script 'user_setup.sql' from directory User in sqlplus as system administrator.
cd setup/User sqlplus system/admin_password @user_setup.sql
Execute the script 'create_table.sql' from directory MEDLINE in sqlplus as user MEDLINE.
cd ../MEDLINE sqlplus text/miner @create_table.sql
If not using provided data file, you can convert your own MEDLINE XML file to the format used by SQL*Loader and rename it to medline.dat for loading. Otherwise use the provided medline.dat file.
perl parse_XML.pl MEDLINE_file_name.xml
Load with SQL*Loader
sqlldr parfile=medline.par readsize=10000000 bindsize=10000000 direct=t
sqlplus text/miner @index.sql
cd ../Objects sqlplus text/miner @make_objects.sql
Thesauri are optional, but needed for term co-occurence results and useful for search term expansion. The Gene Ontology (GO) thesaurus can take as long as an hour to load. The NCI Thesaurus loads in 20 to 30 minutes. The transcription factor (TF) thesaurus is fast. All thesauri are mirrored in the stretchviewer java applet accesses from the demo start page to allow navigation of the hierarchy tree and search term selection.
cd ../Thesauri cd TFT sqlplus text/miner @setup.sql cd ../VG sqlplus text/miner @VG_thes.sql cd ../FSector sqlplus text/miner @make_thes.sql cd ../MeSH sqlplus text/miner @make_thes.sql cd ../NCIT sqlplus text/miner @setup.sql cd ../GO sqlplus text/miner @build_thes.sql
Get appropriate installation package from Apache.org
Install and start Apache server.
The demo assumes that Apache was installed in directory "C:\Program Files\Apache Group\Apache2". It is needed for creation of stretchviewer data files. If Apache is installed in a different location change the $TM::APACHE_ROOT value in the TM.pm module in the demo package (located in the Apache\cgi-bin\TexMining directory). Remember to use double \'s in the path on Windows.
Next, copy demo files from the package directories (Apache\htdocs and Apache\cgi-bin) to the equivalent Apache server directories.# Apache server root directory $TM::APACHE_ROOT = "C:\\Program Files\\Apache Group\\Apache2";
From the demo's Apache htdocs folder, copy the following directories into the Apache Server htdocs folder:The demo uses the Oracle perl distribution and all cgi scripts (in the cgi-bin\TextMining directory) point to the "C:\oracle\product\10.2.0\db_1\perl\5.8.3\bin\MSWin32-x86-multi-thread\perl" executable. All headers need to be changed if a different perl executable is used.
You must copy the stretchviewer applet jar file from $ORACLE_HOME/ctx/sample/stretch/stretchviewer.jar to the Apache htdocs/applet directory.
- TextMining
- css
- applet
From the demo's Apache cgi-bin folder, copy the following directories into the Apache Server cgi-bin folder:
- TextMining
The database connection is set in the OTU.pm module (located in the Apache\cgi-bin\TexMining directory). The user, password, database name (SID), port and server are set here. Make sure to change the default database name (ora10g) to the name of your database. Also, set the $ORACLE_ROOT value ("C:\oracle\product\10.2.0\db_1") to the correct location.
A. Installation of Cytoscape and text mining demo components
Consult the README.txt to learn how to use Cytoscape to visualize text mining demo co-occurence graph.
- Install cytoscape from http://cytoscape.org
- Install Oracle plugin from http://www.oracle.com/technology/industries/life_sciences/ls_sample_code.html
Note: Plugin is only needed to store graph in the Oracle database as a Network Data Model. Co-occurence results from the demo are saved in cytoscape format files for direct loading into the viewer.- Replace vizmap.props file in the cytoscape root directory with text mining demo version file OR append contents of vizmap.props.TM to your existing vizmap.props file to retain any existing personalizations.
Open browser and type http://localhost/cgi-bin/TextMining/TM.pl into the navigation window. This allow you to select the data source defined in the SRC.pm file. Each data source needs a TM.[source].pm and OTU.[source].pm file defining the table and text index information, as well as any existing sections for use in searching, data mining and document display. To use the currently pre-selected data source just start the demo from this location: http://localhost/cgi-bin/TextMining/search.pl
Begin text mining! Here's an outline of what you can do:. A User Guide is also available.
An e-seminar demonstrating the features of this demo can be found on the Life Science OTN web site. Note: Requires MS Internet Explorer.
A demo workflow is available in PDF format.