CrocoBLAST:Terminology: Difference between revisions
Line 21: | Line 21: | ||
=Queue= | =Queue= | ||
All BLAST jobs created within the CrocoBLAST environment are included in a list, which we further refer to as ''queue''. The concept of ''queue'' is useful because it allows you to plan your work in advance and manage your jobs as you need. While CrocoBLAST only runs one ''job'' at a time, all your interaction with the created jobs will be via the ''queue''. For example, you may pause one job to obtain the partial alignment results, and start another job while you analyze the partial results of the original job. This enables you to retain the settings and progress of the original job, which you may later choose to resume. |
Revision as of 23:41, 23 July 2016
There are a few basic terms you need to keep in mind when running BLAST within CrocoBLAST.
Input file and Database
It its essence, BLAST takes an unknown nucleotide or protein sequence, tries to align it against a set of reference sequences, and then reports the score of each alignment, in an effort to help you identify the unknown sequence. In practice, this translates into taking an input file with many query sequences, and aligning each of the query sequences against a database of known sequences. Such databases are typically stored in suitable repositories such as NCBI, or may be obtained in-house.
Therefore, in order to run BLAST, you will need to specify an input file containing the query sequences, and a database file containing the reference sequences. CrocoBLAST accepts input files in FASTA and FASTQ format. BLAST uses a specific database format for database file. You may indicate the database file either in database format or in FASTA or FASTQ format, which will be converted to database format before BLAST is run. Within CrocoBLAST you may directly download databases from the NCBI server.
BLAST program
Depending on the nature of the query and reference sequences, there are several BLAST programs you may use within CrocoBLAST:
- blastp - compares an amino acid query sequence against a protein sequence database
- blastn - compares a nucleotide query sequence against a nucleotide sequence database
- blastx - compares a nucleotide query sequence translated in all reading frames against a protein sequence database
- tblastn - compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames
- tblastx - compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database
Therefore, in order to run BLAST, you will need to indicate which BLAST program you intend to use.
Job
Within CrocoBLAST, a job is defined by the BLAST program, the database, the input file, and the output location (folder). When created, each job receives a unique job ID that can be references whenever you wish to perform an operation on that job.
Queue
All BLAST jobs created within the CrocoBLAST environment are included in a list, which we further refer to as queue. The concept of queue is useful because it allows you to plan your work in advance and manage your jobs as you need. While CrocoBLAST only runs one job at a time, all your interaction with the created jobs will be via the queue. For example, you may pause one job to obtain the partial alignment results, and start another job while you analyze the partial results of the original job. This enables you to retain the settings and progress of the original job, which you may later choose to resume.