running_bwa_commands.md 5.87 KB
Newer Older
npavlovikj's avatar
npavlovikj committed
1
2
3
4
5
6
+++
title = "Running BWA Commands"
description =  "How to run BWA commands on HCC resources"
weight = "10"
+++

npavlovikj's avatar
i    
npavlovikj committed
7
## BWA Index
npavlovikj's avatar
npavlovikj committed
8
9
10
11
12
13
14

The first step of using BWA is to make an index of the reference genome in fasta format. The basic usage of the **bwa index** is:
{{< highlight bash >}}
$ bwa index [-a bwtsw|is] input_reference.fasta index_prefix
{{< /highlight >}}
where **input_reference.fasta** is an input file of the reference genome in fasta format, and **index_prefix** is the prefix of the generated index files. The option **-a** is required and can have two values: **bwtsw** (does not work for short genomes) and **is** (does not work for long genomes). Therefore, this value is chosen according to the length of the genome.

npavlovikj's avatar
i    
npavlovikj committed
15
16

## BWA Mem
npavlovikj's avatar
npavlovikj committed
17
18
19
20
21
22
23

The **bwa mem** algorithm is one of the three algorithms provided by BWA. It performs local alignment and produces alignments for different part of the query sequence. The basic usage of **bwa mem** is:
{{< highlight bash >}}
$ bwa mem index_prefix [input_reads.fastq|input_reads_pair_1.fastq input_reads_pair_2.fastq] [options]
{{< /highlight >}}
where **index_prefix** is the index for the reference genome generated from **bwa index**, and **input_reads.fastq**, **input_reads_pair_1.fastq**, **input_reads_pair_2.fastq** are the input files of sequencing data that can be single-end or paired-end respectively. Additional **options** for **bwa mem** can be found in the BWA manual.

npavlovikj's avatar
i    
npavlovikj committed
24

25
Simple SLURM script for running **bwa mem** on Crane with paired-end fastq input data, `index_prefix` as reference genome index, SAM output file and `8 CPUs` is shown below:
npavlovikj's avatar
npavlovikj committed
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
{{% panel header="`bwa_mem.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Bwa_Mem
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=BwaMem.%J.out
#SBATCH --error=BwaMem.%J.err

module load bwa/0.7

bwa mem index_prefix input_reads_pair_1.fastq input_reads_pair_2.fastq -t $SLURM_NTASKS_PER_NODE > bwa_mem_alignments.sam
{{< /highlight >}}
{{% /panel %}}

npavlovikj's avatar
i    
npavlovikj committed
43
44

## BWA Bwasw
npavlovikj's avatar
npavlovikj committed
45
46
47
48
49
50
51
52

The **bwa bwasw** algorithm is another algorithm provided by BWA. For input files with single-end reads it aligns the query sequences. For input files with paired-ends reads it performs paired-end alignment that only works for Illumina reads.

An example of **bwa bwasw** for single-end input file `input-reads.fasta` in fasta format and output file `bwa_bwasw_alignments.sam` where the alignments are stored, is shown below:
{{< highlight bash >}}
$ bwa bwasw index_prefix input_reads.fasta -t $SLURM_NTASKS_PER_NODE > bwa_bwasw_alignments.sam
{{< /highlight >}}

npavlovikj's avatar
i    
npavlovikj committed
53
54

## BWA Aln
npavlovikj's avatar
npavlovikj committed
55
56
57
58
59
60

The third BWA algorithm, **bwa aln**, aligns the input file of sequence data to the reference genome. In addition, there is an example of running **bwa aln** with single-end `input_reads.fasta` input file and `8 CPUs`:
{{< highlight bash >}}
$ bwa aln index_prefix input_reads.fasta -0 -t $SLURM_NTASKS_PER_NODE > bwa_aln_alignments.sai
{{< /highlight >}}

npavlovikj's avatar
i    
npavlovikj committed
61
62

## BWA Samse and BWA Sampe
npavlovikj's avatar
npavlovikj committed
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79

The command **bwa samse** uses the `bwa_aln_alignments.sai` output from **bwa aln** in order to generate SAM file from the alignments for single-end reads.

{{% panel header="`General BWA Samse Usage`"%}}
{{< highlight bash >}}
$ bwa samse -f bwa_aln_alignments.sam index_prefix bwa_aln_alignments.sai input_reads.fasta   output31.preArc
{{< /highlight >}}
{{% /panel %}}

The command **bwa sampe** uses the `bwa_aln_alignments.sai` output form **bwa aln** in order to generate SAM file from the alignments for paired-end reads.

{{% panel header="`General BWA Sampe Usage`"%}}
{{< highlight bash >}}
$ bwa samse -f bwa_aln_alignments.sam index_prefix bwa_aln_alignments_pair_1.sai bwa_aln_alignments_pair_2.sai input_reads_pair_1.fasta input_reads_pair_2.fasta
{{< /highlight >}}
{{% /panel %}}

npavlovikj's avatar
i    
npavlovikj committed
80
81

## BWA Fastmap
npavlovikj's avatar
npavlovikj committed
82
83
84
85
86
87

The command **bwa fastmap** identifies and outputs super-maximal exact matches (SMEMs). The basic usage of **bwa fastmap** is:
{{< highlight bash >}}
$ bwa fastmap index_prefix input_reads.fasta > bwa_fastmap.matches
{{< /highlight >}}

npavlovikj's avatar
i    
npavlovikj committed
88
89

## BWA Pemerge
npavlovikj's avatar
npavlovikj committed
90
91
92
93
94
95

The command **bwa pemerge** merges overlapping paired ends and can print either only the merged reads or the unmerged ones. An example of **bwa pemerge** of `input_reads_pair_1.fastq` and `input_reads_pair_2.fastq` with `8 CPUs` and output file `output_reads_merged.fastq` that contains only the merged reads is shown below:
{{< highlight bash >}}
$ bwa pemerge -m input_reads_pair_1.fastq input_reads_pair_2.fastq -t $SLURM_NTASKS_PER_NODE > output_reads_merged.fastq
{{< /highlight >}}

npavlovikj's avatar
i    
npavlovikj committed
96
97

## BWA Fa2pac
npavlovikj's avatar
npavlovikj committed
98
99
100
101
102
103

The command **bwa fa2pac** converts fasta to pac files. The general usage of **bwa fa2pac** is:
{{< highlight bash >}}
$ bwa fa2pac input_reads.fasta pac_prefix
{{< /highlight >}}

npavlovikj's avatar
i    
npavlovikj committed
104
105

## BWA Pac2bwt and BWA Pac2bwtgen
npavlovikj's avatar
npavlovikj committed
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120

The commands **bwa pac2bwt** and **bwa pac2bwtgen** convert pac to bwt files.

{{% panel header="`General BWA Pac2bwt Usage`"%}}
{{< highlight bash >}}
$ bwa pac2bwt input_reads.pac output_reads.bwt
{{< /highlight >}}
{{% /panel %}}

{{% panel header="`General BWA Pac2bwtgen Usage`"%}}
{{< highlight bash >}}
$ bwa pac2bwtgen input_reads.pac output_reads.bwt
{{< /highlight >}}
{{% /panel %}}

npavlovikj's avatar
i    
npavlovikj committed
121
122

## BWA Bwtupdate
123

npavlovikj's avatar
npavlovikj committed
124
125
126
127
The command **bwa bwtupdate** updates bwt files to the new format. The general usage of **bwa bwtupdate** is:
{{< highlight bash >}}
$ bwa bwtupdate input_reads.bwt
{{< /highlight >}}
128

npavlovikj's avatar
i    
npavlovikj committed
129
130

## BWA Bwt2sa
131

npavlovikj's avatar
npavlovikj committed
132
133
134
135
The command **bwa bwt2sa** generates sa files from bwt and Occ files. The basic usage of **bwa bwt2sa** is:
{{< highlight bash >}}
$ bwa bwt2sa input_reads.bwt output_reads.sa
{{< /highlight >}}
136

npavlovikj's avatar
i    
npavlovikj committed
137
138

### Useful Information
139

npavlovikj's avatar
npavlovikj committed
140
In order to test the scalability of BWA (bwa/0.7) on Crane, we used two paired-end input fastq files, `large_1.fastq` and `large_2.fastq`, and one single-end input fasta file, `large.fasta`. Some statistics about the input files and the time and memory resources used by **bwa mem** are shown on the table below:
npavlovikj's avatar
i    
npavlovikj committed
141
{{< readfile file="/static/html/bwa.html" >}}