bowtie.md 2.85 KB
Newer Older
npavlovikj's avatar
npavlovikj committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
+++
title = "Bowtie"
description =  "How to run Bowtie on HCC resources"
weight = "10"
+++

[Bowtie] (http://bowtie-bio.sourceforge.net/index.shtml) is an ultrafast and memory-efficient aligner for large sets of sequencing reads to a reference genome. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small. Bowtie also supports usage of multiple processors to achieve greater alignment speed.

The first and basic step of running Bowtie is to build and format an index from the reference genome. The basic usage of this command, **bowtie-build** is:
{{< highlight bash >}}
$ bowtie-build input_reference.fasta index_prefix
{{< /highlight >}}
where **input_reference.fasta** is an input file of sequence reads in fasta format, and **index_prefix** is the prefix of the generated index files.

After the index of the reference genome is generated, the next step is to align the reads. The basic usage of bowtie is:
{{< highlight bash >}}
$ bowtie [-q|-f|-r|-c] index_prefix [-1 input_reads_pair_1.[fasta|fastq] -2 input_reads_pair_2.[fasta|fastq] | input_reads.[fasta|fastq]] [options]
{{< /highlight >}}
where **index_prefix** is the generated index using the **bowtie-build** command, and **options** are optional parameters that can be found in the [Bowtie
manual] (http://bowtie-bio.sourceforge.net/manual.shtml).

Bowtie supports both single-end (`input_reads.[fasta|fastq]`) and paired-end (`input_reads_pair_1.[fasta|fastq]`, `input_reads_pair_2.[fasta|fastq]`) files in fasta or fastq format. The format of the input files also needs to be specified by using the following flags: **-q** (fastq files), **-f** (fasta files), **-r** (raw one-sequence per line), or **-c** (sequences given on command line).

\\
An example of how to run Bowtie alignment on Tusker with single-end fastq file and `8 CPUs` is shown below:
{{% panel header="`bowtie_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Bowtie
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Bowtie.%J.out
#SBATCH --error=Bowtie.%J.err

module load bowtie/1.1

bowtie -q index_prefix input_reads.fastq -p $SLURM_NTASKS_PER_NODE > bowtie_alignments.sam
{{< /highlight >}}
{{% /panel %}}

\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">Bowtie Output</span>

Bowtie output is an alignment file in SAM format, where one line is one alignment. Each line is a collection of 8 fields separated by tabs. The fields are: name of the aligned reads, reference strand aligned to, name of reference sequence where the alignment occurs, 0-based offset into the forward reference strand where leftmost character of the alignment occurs, read sequence, read qualities, the number of other instances where the same sequence is aligned against the same reference characters, and comma-separated list of mismatch descriptors.