bowtie.md 2.78 KB
Newer Older
npavlovikj's avatar
npavlovikj committed
1
2
3
4
5
6
+++
title = "Bowtie"
description =  "How to run Bowtie on HCC resources"
weight = "10"
+++

npavlovikj's avatar
i    
npavlovikj committed
7

npavlovikj's avatar
npavlovikj committed
8
9
[Bowtie] (http://bowtie-bio.sourceforge.net/index.shtml) is an ultrafast and memory-efficient aligner for large sets of sequencing reads to a reference genome. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small. Bowtie also supports usage of multiple processors to achieve greater alignment speed.

npavlovikj's avatar
i    
npavlovikj committed
10

npavlovikj's avatar
npavlovikj committed
11
12
13
14
15
16
17
18
19
20
21
22
23
The first and basic step of running Bowtie is to build and format an index from the reference genome. The basic usage of this command, **bowtie-build** is:
{{< highlight bash >}}
$ bowtie-build input_reference.fasta index_prefix
{{< /highlight >}}
where **input_reference.fasta** is an input file of sequence reads in fasta format, and **index_prefix** is the prefix of the generated index files.

After the index of the reference genome is generated, the next step is to align the reads. The basic usage of bowtie is:
{{< highlight bash >}}
$ bowtie [-q|-f|-r|-c] index_prefix [-1 input_reads_pair_1.[fasta|fastq] -2 input_reads_pair_2.[fasta|fastq] | input_reads.[fasta|fastq]] [options]
{{< /highlight >}}
where **index_prefix** is the generated index using the **bowtie-build** command, and **options** are optional parameters that can be found in the [Bowtie
manual] (http://bowtie-bio.sourceforge.net/manual.shtml).

npavlovikj's avatar
i    
npavlovikj committed
24

npavlovikj's avatar
npavlovikj committed
25
26
Bowtie supports both single-end (`input_reads.[fasta|fastq]`) and paired-end (`input_reads_pair_1.[fasta|fastq]`, `input_reads_pair_2.[fasta|fastq]`) files in fasta or fastq format. The format of the input files also needs to be specified by using the following flags: **-q** (fastq files), **-f** (fasta files), **-r** (raw one-sequence per line), or **-c** (sequences given on command line).

npavlovikj's avatar
i    
npavlovikj committed
27

npavlovikj's avatar
npavlovikj committed
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
An example of how to run Bowtie alignment on Tusker with single-end fastq file and `8 CPUs` is shown below:
{{% panel header="`bowtie_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Bowtie
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Bowtie.%J.out
#SBATCH --error=Bowtie.%J.err

module load bowtie/1.1

bowtie -q index_prefix input_reads.fastq -p $SLURM_NTASKS_PER_NODE > bowtie_alignments.sam
{{< /highlight >}}
{{% /panel %}}


npavlovikj's avatar
i    
npavlovikj committed
47
48
49
### Bowtie Output

Bowtie output is an alignment file in SAM format, where one line is one alignment. Each line is a collection of 8 fields separated by tabs. The fields are: name of the aligned reads, reference strand aligned to, name of reference sequence where the alignment occurs, 0-based offset into the forward reference strand where leftmost character of the alignment occurs, read sequence, read qualities, the number of other instances where the same sequence is aligned against the same reference characters, and comma-separated list of mismatch descriptors.