[BLAST] (https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a local alignment tool that finds similarity between sequences. This tool compares nucleotide or protein sequences to sequence databases, and calculates significance of matches. Sometimes these input sequences are large and using the command-line BLAST is required.
The following pages, [Create Local BLAST Database](create_local_blast_database) and [Running BLAST Alignment](running_blast_alignment) describe how to run some of the most common BLAST executables as a single job using the SLURM scheduler on HCC.
The following pages, [Create Local BLAST Database]({{<relref"create_local_blast_database">}}) and [Running BLAST Alignment]({{<relref"running_blast_alignment">}}) describe how to run some of the most common BLAST executables as a single job using the SLURM scheduler on HCC.
These BLAST alignment commands are multi-threaded, and therefore using the BLAST option **-num_threads <number_of_CPUs>** is recommended.
HCC hosts multiple BLAST databases and indices on both Tusker and Crane. In order to use these resources, the ["biodata" module] (../../../biodata_module) needs to be loaded first. The **$BLAST** variable contains the following currently available databases:
HCC hosts multiple BLAST databases and indices on both Tusker and Crane. In order to use these resources, the ["biodata" module] ({{<relref"/guides/running_applications/bioinformatics_tools/biodata_module">}}) needs to be loaded first. The **$BLAST** variable contains the following currently available databases:
-**16SMicrobial**
-**env_nt**
...
...
@@ -50,7 +50,7 @@ HCC hosts multiple BLAST databases and indices on both Tusker and Crane. In orde
-**tsa_nr**
-**tsa_nt**
If you want to create and use a BLAST database that is not mentioned above, check [Create Local BLAST Database](create_local_blast_database).
If you want to create and use a BLAST database that is not mentioned above, check [Create Local BLAST Database]({{<relref"create_local_blast_database">}}).
Basic SLURM example of nucleotide BLAST run against the non-redundant **nt** BLAST database with `8 CPUs` is provided below. When running BLAST alignment, it is recommended to first copy the query and database files to the **/scratch/** directory of the worker node. Moreover, the BLAST output is also saved in this directory (**/scratch/blastn_output.alignments**). After BLAST finishes, the output file is copied from the worker node to your current work directory.
where **index_prefix** is the basename of the genome index to be searched. This index is generated prior running TopHat/TopHat2 by using [Bowtie](bowtie)/[Bowtie2](bowtie2).
where **index_prefix** is the basename of the genome index to be searched. This index is generated prior running TopHat/TopHat2 by using [Bowtie]({{<relref"bowtie">}})/[Bowtie2]({{<relref"bowtie2">}}).
TopHat2 uses single or comma-separated list of paired-end and single-end reads in fasta or fastq format. The single-end reads need to be provided after the paired-end reads.
The organisms and their appropriate environmental variables for all genomes and chromosome files, as well as for short read aligned indices are shown on the link below: