running_blast_alignment.md 5.56 KB
Newer Older
Adam Caprez's avatar
Adam Caprez committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
+++
title = " Running BLAST Alignment"
description =  "How to run BLAST alignment on HCC resources"
weight = "10"
+++


Basic BLAST has the following commands:

- **blastn**: search nucleotide database using a nucleotide query
- **blastp**: search protein database using a protein query
- **blastx**: search protein database using a translated nucleotide query
- **tblastn**: search translated nucleotide database using a protein query
- **tblastx**: search translated nucleotide database using a translated nucleotide query


The basic usage of **blastn** is:
{{< highlight bash >}}
$ blastn -query input_reads.fasta -db input_reads_db -out blastn_output.alignments [options]
{{< /highlight >}}
where **input_reads.fasta** is an input file of sequence data in fasta format, **input_reads_db** is the generated BLAST database, and **blastn_output.alignments** is the output file where the alignments are stored.

23
Additional parameters can be found in the [BLAST manual](https://www.ncbi.nlm.nih.gov/books/NBK279690/), or by typing:
Adam Caprez's avatar
Adam Caprez committed
24
25
26
27
28
29
30
{{< highlight bash >}}
$ blastn -help
{{< /highlight >}}

These BLAST alignment commands are multi-threaded, and therefore using the BLAST option **-num_threads <number_of_CPUs>** is recommended.


31
HCC hosts multiple BLAST databases and indices on Crane. In order to use these resources, the ["biodata" module]({{<relref "/applications/app_specific/bioinformatics_tools/biodata_module">}}) needs to be loaded first. The **$BLAST** variable contains the following currently available databases:
Adam Caprez's avatar
Adam Caprez committed
32
33
34
35
36
37
38
39

- **16SMicrobial**
- **nr**
- **nt**
- **refseq_genomic**
- **refseq_rna**
- **swissprot**

40
If you want to create and use a BLAST database that is not mentioned above, check [Create Local BLAST Database]({{<relref "create_local_blast_database" >}}). If you want a database to be added to the ["biodata" module]({{<relref "/applications/app_specific/bioinformatics_tools/biodata_module">}}), please send a request to bcrf-support@unl.edu.
Adam Caprez's avatar
Adam Caprez committed
41

42
43
44
45
{{% notice info %}}
**To access the older format of BLAST databases that work with BLAST+ 2.9 and lower, please use the variable BLAST_V4.**
**The variable BLAST points to the directory with the new version 5 of the nucleotide and protein databases required for BLAST+ 2.10 and higher.**
{{% /notice %}}
Adam Caprez's avatar
Adam Caprez committed
46
47
48
49
50
51
52
53
54

Basic SLURM example of nucleotide BLAST run against the non-redundant **nt** BLAST database with `8 CPUs` is provided below. When running BLAST alignment, it is recommended to first copy the query and database files to the **/scratch/** directory of the worker node. Moreover, the BLAST output is also saved in this directory (**/scratch/blastn_output.alignments**). After BLAST finishes, the output file is copied from the worker node to your current work directory.
{{% notice info %}}
**Please note that the worker nodes can not write to the */home/* directories and therefore you need to run your job from your */work/* directory.**
**This example will first copy your database to faster local storage called “scratch”.  This can greatly improve performance!**
{{% /notice %}}

{{% panel header="`blastn_alignment.submit`"%}}
{{< highlight bash >}}
Caughlin Bohn's avatar
Caughlin Bohn committed
55
#!/bin/bash
Adam Caprez's avatar
Adam Caprez committed
56
57
58
59
60
61
62
63
#SBATCH --job-name=BlastN
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=20gb
#SBATCH --output=BlastN.%J.out
#SBATCH --error=BlastN.%J.err

64
module load blast/2.10
Adam Caprez's avatar
Adam Caprez committed
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
module load biodata/1.0

cd $WORK/<project_folder>
cp $BLAST/nt.* /scratch/
cp input_reads.fasta /scratch/

blastn -query /scratch/input_reads.fasta -db /scratch/nt -out /scratch/blastn_output.alignments -num_threads $SLURM_NTASKS_PER_NODE

cp /scratch/blastn_output.alignments $WORK/<project_folder>
{{< /highlight >}}
{{% /panel %}}


One important BLAST parameter is the **e-value threshold** that changes the number of hits returned by showing only those with value lower than the given. To show the hits with **e-value** lower than 1e-10, modify the given script as follows:
{{< highlight bash >}}
$ blastn -query input_reads.fasta -db input_reads_db -out blastn_output.alignments -num_threads $SLURM_NTASKS_PER_NODE -evalue 1e-10
{{< /highlight >}}


84
The default BLAST output is in pairwise format. However, BLAST’s parameter **-outfmt** supports output in [different formats](https://www.ncbi.nlm.nih.gov/books/NBK279684/) that are easier for parsing.
Adam Caprez's avatar
Adam Caprez committed
85
86
87
88
89
90
91
92
93
94


Basic SLURM example of protein BLAST run against the non-redundant **nr **BLAST database with tabular output format and `8 CPUs` is shown below. Similarly as before, the query and database files are copied to the **/scratch/** directory. The BLAST output is also saved in this directory (**/scratch/blastx_output.alignments**). After BLAST finishes, the output file is copied from the worker node to your current work directory.
{{% notice info %}}
**Please note that the worker nodes can not write to the */home/* directories and therefore you need to run your job from your */work/* directory.**
**This example will first copy your database to faster local storage called “scratch”.  This can greatly improve performance!**
{{% /notice %}}

{{% panel header="`blastx_alignment.submit`"%}}
{{< highlight bash >}}
Caughlin Bohn's avatar
Caughlin Bohn committed
95
#!/bin/bash
Adam Caprez's avatar
Adam Caprez committed
96
97
98
99
100
101
102
103
#SBATCH --job-name=BlastX
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=20gb
#SBATCH --output=BlastX.%J.out
#SBATCH --error=BlastX.%J.err

104
module load blast/2.10
Adam Caprez's avatar
Adam Caprez committed
105
106
107
108
109
110
111
112
113
114
115
module load biodata/1.0

cd $WORK/<project_folder>
cp $BLAST/nr.* /scratch/
cp input_reads.fasta /scratch/

blastx -query /scratch/input_reads.fasta -db /scratch/nr -outfmt 6 -out /scratch/blastx_output.alignments -num_threads $SLURM_NTASKS_PER_NODE

cp /scratch/blastx_output.alignments $WORK/<project_folder>
{{< /highlight >}}
{{% /panel %}}