bowtie2.md 3.53 KB
Newer Older
npavlovikj's avatar
npavlovikj committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
+++
title = "Bowtie2"
description =  "How to run Bowtie2 on HCC resources"
weight = "10"
+++

[Bowtie2] (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. Although Bowtie and Bowtie2 are both fast read aligners, there are few main differences between them:

- Bowtie2 supports gapped alignment with affine gap penalties, without restrictions on the number of gaps and gap lengths.
- Bowtie supports reads longer than 50bp and is generally faster, more sensitive, and uses less memory than Bowtie.
- Bowtie support only end-to-end alignments, while Bowtie2 supports both end-to-end and local alignment.
- Bowtie has an upper limit on read length of around 1,000 bp, while Bowtie2 does not have any.
- Bowtie2's paired-end alignment is more flexible that Bowtie's.
- Bowtie2 does not align colorspace reads.
- Bowtie and Bowtie2 indices are not compatible.

Same as Bowtie, the first and basic step of running Bowtie2 is to build Bowtie2 index from a reference genome sequence. The basic usage of the
command **bowtie2-build** is:
{{< highlight bash >}}
$ bowtie2-build -f input_reference.fasta index_prefix
{{< /highlight >}}
where **input_reference.fasta** is an input file of sequence reads in fasta format, and **index_prefix** is the prefix of the generated index files. Beside the option **-f** that is used when the reference input file is a fasta file, the option **-c** can be used when the reference sequences are given on the command line.

The command **bowtie2** takes a Bowtie2 index and set of sequencing read files and outputs set of alignments in SAM format. The general **bowtie2** usage is:
{{< highlight bash >}}
$ bowtie2 -x index_prefix [-q|--qseq|-f|-r|-c] [-1 input_reads_pair_1.[fasta|fastq] -2 input_reads_pair_2.[fasta|fastq] | -U input_reads.[fasta|fastq]] -S bowtie2_alignments.sam [options]
{{< /highlight >}}
where **index_prefix** is the generated index using the **bowtie2-build** command, and **options** are optional parameters that can be found in the [Bowtie2 manual] (http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml). Bowtie2 supports both single-end (`input_reads.[fasta|fastq]`) and paired-end (`input_reads_pair_1.[fasta|fastq]`, `input_reads_pair_2.[fasta|fastq]`) files in fasta or fastq format. The format of the input files also needs to be specified by using one of the following flags: **-q** (fastq files), **--qseq** (Illumina's qseq format), **-f** (fasta files), **-r** (raw one sequence per line), or **-c** (sequences given on command line).

\\
An example of how to run Bowtie2 local alignment on Tusker with paired-end fasta files and `8 CPUs` is shown below:
{{% panel header="`bowtie2_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Bowtie2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Bowtie2.%J.out
#SBATCH --error=Bowtie2.%J.err

module load bowtie/2.3

bowtie2 -x index_prefix -f -1 input_reads_pair_1.fasta -2 input_reads_pair_2.fasta -S bowtie2_alignments.sam --local -p $SLURM_NTASKS_PER_NODE
{{< /highlight >}}
{{% /panel %}}

\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">Bowtie2 Output</span>

Bowtie2 outputs alignments in SAM format that can further be manipulated with different tools, like SAMtools and GATK. Each line from the file describes an alignment and is a collection of at least 12 fields separated by tabs. Detailed information about Bowtie2 output fields can be found in the Bowtie2 manual.