CrocoBLAST:Terminology
There are a few basic terms you need to keep in mind when running BLAST within CrocoBLAST.
Input file and Database
It its essence, BLAST takes an unknown nucleotide or protein sequence, tries to align it against a set of reference sequences, and then reports the score of each alignment, in an effort to help you identify the unknown sequence. In practice, this translates into taking an input file with many query sequences, and aligning each of the query sequences against a database of known sequences. Such databases are typically stored in suitable repositories such as NCBI, or may be obtained in-house.
Therefore, in order to run BLAST, you will need to specify an input file containing the query sequences, and a database file containing the reference sequences. CrocoBLAST accepts input files in FASTA and FASTQ format. BLAST uses a specific database format for database file. You may indicate the database file either in database format or in FASTA or FASTQ format, which will be converted to database format before BLAST is run. Within CrocoBLAST you may directly download databases from the NCBI server.
BLAST program
Depending on the nature of the query and reference sequences, there are several BLAST programs you may use within CrocoBLAST:
- blastp - compares an amino acid query sequence against a protein sequence database
- blastn - compares a nucleotide query sequence against a nucleotide sequence database
- blastx - compares a nucleotide query sequence translated in all reading frames against a protein sequence database
- tblastn - compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames
- tblastx - compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database
Therefore, in order to run BLAST, you will need to indicate which BLAST program you intend to use.