Oracle Data Mining provides a sample program for the BLAST algorithm and three sample sequence data tables. The program requires Oracle 10g Release 2 Enterprise Edition. The program and data files are installed from the Oracle Database Companion CD. The installation copies the files to ~rdbms/demo
in the Oracle home directory.
The three sample sequence data tables are:
SWISSPROT
table, which contains the sequences in Release 40 of the SwissProt database. This table has the sequence identifier, creation_date
, organism
, and sequence_
data
attributes. It has 101,602 protein sequences.PROT_DB
table, which consists of 19 protein sequences from Release 40 of the SwissProt
data set.ECOLI10
table, which contains 10 nucleotide sequences from the Escherichia coli data
set.Several steps are involved in creating the BLAST datasets. The following scripts, data files, and control files are required:
dmblprot.sql
script creates the PROT_DB
datasetdmblcoli.sql
creates the ECOLI10
datasetdmblprot.ctl
control file is used with SQL*Loader to create the SWISSPROT
dataset.dmblprot.txt
text file contains the data for the SWISSPROT
dataset.
Use the links on this page to locate the scripts, data files, and control files, or locate them in the ~rdbms/demo
subdirectory of the Oracle home directory.
Prepare to run the BLAST sample program:
1. Start SQL*Plus and run dmblprot.sql to create the PROT_DB
dataset.
SQL>@dmblprot.sql
2. Run dmblcoli.sql to create the ECOLI10
dataset.
SQL>@dmblcoli.sql
3. Exit SQL*Plus and run SQL*Loader to create the SWISSPROT
dataset. Specify the dmblprot.ctl control file and the dmblprot.txt data file.
>sqlldir dmuser/dmuser_password control=dmblprot.ctl data=dmblprot.txt log=dmblprot.log
Refer to Oracle Data Mining Administrator's Guide for more details.