tophat_tophat2.md 3.07 KB
Newer Older
npavlovikj's avatar
npavlovikj committed
1
2
3
4
5
+++
title = "TopHat/TopHat2"
description =  "How to run TopHat/TopHat2 on HCC resources"
weight = "10"
+++
6
7


8
[TopHat](https://ccb.jhu.edu/software/tophat/index.shtml) is a fast splice junction mapper for RNA-Seq data. It first aligns RNA-Seq reads to reference genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. 
9

npavlovikj's avatar
npavlovikj committed
10
Although there is no difference between the available options for both TopHat and TopHat2 and the number of output files, TopHat2 incorporates many significant improvements to TopHat. The TopHat package at HCC supports both **tophat** and **tophat2**.
11

npavlovikj's avatar
i    
npavlovikj committed
12

npavlovikj's avatar
npavlovikj committed
13
14
15
16
The basic usage of TopHat2 is:
{{< highlight bash >}}
$ [tophat|tophat2] [options] index_prefix [input_reads_pair_1.[fasta|fastq] input_reads_pair_2.[fasta|fastq] | input_reads.[fasta|fastq]]
{{< /highlight >}}
17
where **index_prefix** is the basename of the genome index to be searched. This index is generated prior running TopHat/TopHat2 by using [Bowtie]({{<relref "bowtie" >}})/[Bowtie2]({{<relref "bowtie2" >}}). 
18

npavlovikj's avatar
npavlovikj committed
19
TopHat2 uses single or comma-separated list of paired-end and single-end reads in fasta or fastq format. The single-end reads need to be provided after the paired-end reads.
20

npavlovikj's avatar
i    
npavlovikj committed
21

22
More advanced TopHat2 options can be found in [its manual](https://ccb.jhu.edu/software/tophat/manual.shtml), or by typing:
npavlovikj's avatar
npavlovikj committed
23
24
25
{{< highlight bash >}}
$ tophat2 -h
{{< /highlight >}}
26

npavlovikj's avatar
npavlovikj committed
27
Prior running TopHat/TopHat2, an index from the reference genome should be built using Bowtie/Bowtie2. Moreover, TopHat2 requires both, the index file and the reference file, to be in the same directory. If the reference file is not available,TopHat2 reconstructs it in its initial step using the index file.
28

npavlovikj's avatar
i    
npavlovikj committed
29

30
An example of how to run TopHat2 on Crane with paired-end fastq files `input_reads_pair_1.fastq` and `input_reads_pair_2.fastq`, reference index `index_prefix` and `8 CPUs` is shown below:
npavlovikj's avatar
npavlovikj committed
31
32
{{% panel header="`tophat2_alignment.submit`"%}}
{{< highlight bash >}}
Caughlin Bohn's avatar
Caughlin Bohn committed
33
#!/bin/bash
npavlovikj's avatar
npavlovikj committed
34
35
36
37
38
39
40
#SBATCH --job-name=Tophat2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Tophat2.%J.out
#SBATCH --error=Tophat2.%J.err
41

npavlovikj's avatar
npavlovikj committed
42
module load samtools/1.3 bowtie/2.3 tophat/2.0
43

npavlovikj's avatar
npavlovikj committed
44
45
46
tophat2 -p $SLURM_NTASKS_PER_NODE index_prefix input_reads_pair_1.fastq input_reads_pair_2.fastq
{{< /highlight >}}
{{% /panel %}}
47

npavlovikj's avatar
npavlovikj committed
48
TopHat2 generates its own output directory `tophat_output/` that contains multiple TopHat2 generated files.
49

npavlovikj's avatar
i    
npavlovikj committed
50
51

### TopHat2 Output
52

npavlovikj's avatar
npavlovikj committed
53
TopHat2 produces number of files in its `tophat_out/` output directory. Some of the generated files are:
54

npavlovikj's avatar
npavlovikj committed
55
56
57
58
59
60
- **accepted_hits.bam**: list of read alignments in BAM format
- **unmapped.bam**: list of unmapped reads in BAM format
- **junctions.bed**: BED track of reported junctions
- **insertions.bed**: BED track of insertions reported by TopHat
- **deletions.bed**: BED track of deletions reported by TopHat
- **prep_reads.info**: statistics about the input sequencing data (min/max read length, number of reads)
npavlovikj's avatar
i    
npavlovikj committed
61
- **align_summary.txt**: summary of the alignment counts (number of mapped reads, overall read mapping rate)