In life sciences, vast quantities of data including nucleotide and amino acid sequences are stored, typically in a database. These sequence data help biologists determine the chemical structure, biological function, and evolutionary history of organisms. A key feature of managing the exponential growth in sequence data sources is the availability of fast, sensitive, and statistically rigorous techniques for detecting similarities between these sequences. As the amount of nucleotide and amino acid sequence data continues to grow, the data becomes increasingly useful in the analysis of newly sequenced genes and proteins because of the greater chance of finding such sequence similarities.
Sequence alignment is one of the most common bioinformatics tasks. It is present in almost any research and development activity across the many industries in the area of life sciences including academia, biotech, services, software, pharmaceutical companies, and hospitals. BLAST (Basic Local Alignment Search Tool) is the most commonly used sequence alignment algorithm. BLAST, developed by the National Center for Biotechnology Information (NCBI), is a heuristic method to find the high-scoring locally optimal alignments between a query sequence and a database.
BLAST is implemented in Oracle Database as a set of table functions:
BLASTN_COMPRESS
- Compress nucleotide sequence data to improve performance of
sequence searches.BLASTN_MATCH
- Perform a search of the given nucleotide sequence against the
selected portion of the nucleotide databaseBLASTP_MATCH
- Perform a search of the given amino acid sequence against the
selected portion of the protein databaseTBLAST_MATCH
- Perform a search involving translations of either the query
sequence or the database of sequencesBLASTN_ALIGN
- Perform an alignment of the given nucleotide sequence against
the selected portion of the nucleotide database BLASTP_ALIGN
- Perform an alignment of the given amino acid sequence
against the selected portion of the protein database TBLAST_ALIGN
- Perform alignments involving translations of either the query
sequence or the database of sequences