Commit e07bd1be authored by npavlovikj's avatar npavlovikj
Browse files

i

Fix header in bio pages and update yubikey info
wq
parent f74a41ca
......@@ -7,10 +7,10 @@ weight = "52"
[BLAST] (https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a local alignment tool that finds similarity between sequences. This tool compares nucleotide or protein sequences to sequence databases, and calculates significance of matches. Sometimes these input sequences are large and using the command-line BLAST is required.
The following pages, [Create Local BLAST Database](create_local_blast_database) and [Running BLAST Alignment](running_blast_alignment) describe how to run some of the most common BLAST executables as a single job using the SLURM scheduler on HCC.
The following pages, [Create Local BLAST Database](create_local_blast_database) and [Running BLAST Alignment](running_blast_alignment) describe how to run some of the most common BLAST executables as a single job using the SLURM scheduler on HCC.
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">Useful Information</span>
### Useful Information
In order to test the BLAST (blast/2.2) performance on Tusker, we aligned three nucleotide query datasets, `small.fasta`, `medium.fasta` and `large.fasta`, against the non-redundant nucleotide **nt.fasta** database from NCBI. Some statistics about the query datasets and the time and memory resources used for the alignment are shown on the table below:
{{< readfile file="/static/html/blast.html" >}}
\ No newline at end of file
{{< readfile file="/static/html/blast.html" >}}
......@@ -11,7 +11,7 @@ $ makeblastdb -in input_reads.fasta -dbtype [nucl|prot] -out input_reads_db
{{< /highlight >}}
where **input_reads.fasta** is the input file containing all sequences that need to be made into a database, and **dbtype** can be either `nucl` or `prot` depending on the type of the input file.
\\
Simple example of how **makeblastdb** can be run on Tusker using SLURM script and nucleotide database is shown below:
{{% panel header="`blast_db.submit`"%}}
{{< highlight bash >}}
......@@ -30,8 +30,8 @@ makeblastdb -in input_reads.fasta -dbtype nucl -out input_reads_db
{{< /highlight >}}
{{% /panel %}}
\\
More parameters used with **makeblastdb** can be seen by typing:
{{< highlight bash >}}
$ makeblastdb -help
{{< /highlight >}}
\ No newline at end of file
{{< /highlight >}}
......@@ -13,6 +13,7 @@ Basic BLAST has the following commands:
- **tblastn**: search translated nucleotide database using a protein query
- **tblastx**: search translated nucleotide database using a translated nucleotide query
The basic usage of **blastn** is:
{{< highlight bash >}}
$ blastn -query input_reads.fasta -db input_reads_db -out blastn_output.alignments [options]
......@@ -26,7 +27,7 @@ $ blastn -help
These BLAST alignment commands are multi-threaded, and therefore using the BLAST option **-num_threads <number_of_CPUs>** is recommended.
\\
HCC hosts multiple BLAST databases and indices on both Tusker and Crane. In order to use these resources, the ["biodata" module] (../../../biodata_module) needs to be loaded first. The **$BLAST** variable contains the following currently available databases:
- **16SMicrobial**
......@@ -51,7 +52,7 @@ HCC hosts multiple BLAST databases and indices on both Tusker and Crane. In orde
If you want to create and use a BLAST database that is not mentioned above, check [Create Local BLAST Database](create_local_blast_database).
\\
Basic SLURM example of nucleotide BLAST run against the non-redundant **nt** BLAST database with `8 CPUs` is provided below. When running BLAST alignment, it is recommended to first copy the query and database files to the **/scratch/** directory of the worker node. Moreover, the BLAST output is also saved in this directory (**/scratch/blastn_output.alignments**). After BLAST finishes, the output file is copied from the worker node to your current work directory.
{{% notice info %}}
**Please note that the worker nodes can not write to the */home/* directories and therefore you need to run your job from your */work/* directory.**
......@@ -81,16 +82,16 @@ cp /scratch/blastn_output.alignments $WORK/<project_folder>
{{< /highlight >}}
{{% /panel %}}
\\
One important BLAST parameter is the **e-value threshold** that changes the number of hits returned by showing only those with value lower than the given. To show the hits with **e-value** lower than 1e-10, modify the given script as follows:
{{< highlight bash >}}
$ blastn -query input_reads.fasta -db input_reads_db -out blastn_output.alignments -num_threads $SLURM_NTASKS_PER_NODE -evalue 1e-10
{{< /highlight >}}
\\
The default BLAST output is in pairwise format. However, BLAST’s parameter **-outfmt** supports output in [different formats] (https://www.ncbi.nlm.nih.gov/books/NBK279684/) that are easier for parsing.
\\
Basic SLURM example of protein BLAST run against the non-redundant **nr **BLAST database with tabular output format and `8 CPUs` is shown below. Similarly as before, the query and database files are copied to the **/scratch/** directory. The BLAST output is also saved in this directory (**/scratch/blastx_output.alignments**). After BLAST finishes, the output file is copied from the worker node to your current work directory.
{{% notice info %}}
**Please note that the worker nodes can not write to the */home/* directories and therefore you need to run your job from your */work/* directory.**
......@@ -118,4 +119,4 @@ blastx -query /scratch/input_reads.fasta -db /scratch/nr -outfmt 6 -out /scratch
cp /scratch/blastx_output.alignments $WORK/<project_folder>
{{< /highlight >}}
{{% /panel %}}
\ No newline at end of file
{{% /panel %}}
......@@ -7,18 +7,20 @@ weight = "10"
BLAT is a pairwise alignment tool similar to BLAST. It is more accurate and about 500 times faster than the existing tools for mRNA/DNA alignments and it is about 50 times faster with protein/protein alignments. BLAT accepts short and long query and database sequences as input files.
The basic usage of BLAT is:
{{< highlight bash >}}
$ blat database query output_alignment.txt [options]
{{< /highlight >}}
where **database** is the name of the database used for the alignment, **query** is the name of the input file of sequence data in `fasta/nib/2bit` format, and **output_alignment.txt** is the output alignment file.
Additional parameters for BLAT alignment can be found in the [manual] (http://genome.ucsc.edu/FAQ/FAQblat), or by using:
{{< highlight bash >}}
$ blat
{{< /highlight >}}
\\
Running BLAT on Tusker with query file `input_reads.fasta` and database `db.fa` is shown below:
{{% panel header="`blat_alignment.submit`"%}}
{{< highlight bash >}}
......@@ -39,8 +41,8 @@ blat db.fa input_reads.fasta output_alignment.txt
Although BLAT is a single threaded program (`#SBATCH --nodes=1`, `#SBATCH --ntasks-per-node=1`) it is still much faster than the other alignment tools.
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BLAT Output</span>
### BLAT Output
BLAT output is a list containing the following information:
......@@ -48,4 +50,4 @@ BLAT output is a list containing the following information:
- the region of query sequence that matches the database sequence
- the size of the query sequence
- the level of identity as a percentage of the alignment
- the chromosome and position that the query sequence maps to
\ No newline at end of file
- the chromosome and position that the query sequence maps to
......@@ -4,8 +4,10 @@ description = "How to run Bowtie on HCC resources"
weight = "10"
+++
[Bowtie] (http://bowtie-bio.sourceforge.net/index.shtml) is an ultrafast and memory-efficient aligner for large sets of sequencing reads to a reference genome. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small. Bowtie also supports usage of multiple processors to achieve greater alignment speed.
The first and basic step of running Bowtie is to build and format an index from the reference genome. The basic usage of this command, **bowtie-build** is:
{{< highlight bash >}}
$ bowtie-build input_reference.fasta index_prefix
......@@ -19,9 +21,10 @@ $ bowtie [-q|-f|-r|-c] index_prefix [-1 input_reads_pair_1.[fasta|fastq] -2 inpu
where **index_prefix** is the generated index using the **bowtie-build** command, and **options** are optional parameters that can be found in the [Bowtie
manual] (http://bowtie-bio.sourceforge.net/manual.shtml).
Bowtie supports both single-end (`input_reads.[fasta|fastq]`) and paired-end (`input_reads_pair_1.[fasta|fastq]`, `input_reads_pair_2.[fasta|fastq]`) files in fasta or fastq format. The format of the input files also needs to be specified by using the following flags: **-q** (fastq files), **-f** (fasta files), **-r** (raw one-sequence per line), or **-c** (sequences given on command line).
\\
An example of how to run Bowtie alignment on Tusker with single-end fastq file and `8 CPUs` is shown below:
{{% panel header="`bowtie_alignment.submit`"%}}
{{< highlight bash >}}
......@@ -40,7 +43,7 @@ bowtie -q index_prefix input_reads.fastq -p $SLURM_NTASKS_PER_NODE > bowtie_alig
{{< /highlight >}}
{{% /panel %}}
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">Bowtie Output</span>
Bowtie output is an alignment file in SAM format, where one line is one alignment. Each line is a collection of 8 fields separated by tabs. The fields are: name of the aligned reads, reference strand aligned to, name of reference sequence where the alignment occurs, 0-based offset into the forward reference strand where leftmost character of the alignment occurs, read sequence, read qualities, the number of other instances where the same sequence is aligned against the same reference characters, and comma-separated list of mismatch descriptors.
\ No newline at end of file
### Bowtie Output
Bowtie output is an alignment file in SAM format, where one line is one alignment. Each line is a collection of 8 fields separated by tabs. The fields are: name of the aligned reads, reference strand aligned to, name of reference sequence where the alignment occurs, 0-based offset into the forward reference strand where leftmost character of the alignment occurs, read sequence, read qualities, the number of other instances where the same sequence is aligned against the same reference characters, and comma-separated list of mismatch descriptors.
......@@ -4,6 +4,7 @@ description = "How to run Bowtie2 on HCC resources"
weight = "10"
+++
[Bowtie2] (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. Although Bowtie and Bowtie2 are both fast read aligners, there are few main differences between them:
- Bowtie2 supports gapped alignment with affine gap penalties, without restrictions on the number of gaps and gap lengths.
......@@ -14,6 +15,7 @@ weight = "10"
- Bowtie2 does not align colorspace reads.
- Bowtie and Bowtie2 indices are not compatible.
Same as Bowtie, the first and basic step of running Bowtie2 is to build Bowtie2 index from a reference genome sequence. The basic usage of the
command **bowtie2-build** is:
{{< highlight bash >}}
......@@ -21,13 +23,14 @@ $ bowtie2-build -f input_reference.fasta index_prefix
{{< /highlight >}}
where **input_reference.fasta** is an input file of sequence reads in fasta format, and **index_prefix** is the prefix of the generated index files. Beside the option **-f** that is used when the reference input file is a fasta file, the option **-c** can be used when the reference sequences are given on the command line.
The command **bowtie2** takes a Bowtie2 index and set of sequencing read files and outputs set of alignments in SAM format. The general **bowtie2** usage is:
{{< highlight bash >}}
$ bowtie2 -x index_prefix [-q|--qseq|-f|-r|-c] [-1 input_reads_pair_1.[fasta|fastq] -2 input_reads_pair_2.[fasta|fastq] | -U input_reads.[fasta|fastq]] -S bowtie2_alignments.sam [options]
{{< /highlight >}}
where **index_prefix** is the generated index using the **bowtie2-build** command, and **options** are optional parameters that can be found in the [Bowtie2 manual] (http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml). Bowtie2 supports both single-end (`input_reads.[fasta|fastq]`) and paired-end (`input_reads_pair_1.[fasta|fastq]`, `input_reads_pair_2.[fasta|fastq]`) files in fasta or fastq format. The format of the input files also needs to be specified by using one of the following flags: **-q** (fastq files), **--qseq** (Illumina's qseq format), **-f** (fasta files), **-r** (raw one sequence per line), or **-c** (sequences given on command line).
\\
An example of how to run Bowtie2 local alignment on Tusker with paired-end fasta files and `8 CPUs` is shown below:
{{% panel header="`bowtie2_alignment.submit`"%}}
{{< highlight bash >}}
......@@ -46,7 +49,7 @@ bowtie2 -x index_prefix -f -1 input_reads_pair_1.fasta -2 input_reads_pair_2.fas
{{< /highlight >}}
{{% /panel %}}
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">Bowtie2 Output</span>
Bowtie2 outputs alignments in SAM format that can further be manipulated with different tools, like SAMtools and GATK. Each line from the file describes an alignment and is a collection of at least 12 fields separated by tabs. Detailed information about Bowtie2 output fields can be found in the Bowtie2 manual.
\ No newline at end of file
### Bowtie2 Output
Bowtie2 outputs alignments in SAM format that can further be manipulated with different tools, like SAMtools and GATK. Each line from the file describes an alignment and is a collection of at least 12 fields separated by tabs. Detailed information about Bowtie2 output fields can be found in the Bowtie2 manual.
......@@ -3,7 +3,7 @@ title = "BWA"
description = "How to use BWA on HCC machines"
weight = "52"
+++
 
BWA (Burrows-Wheeler Aligner) is a software package for mapping relatively short nucleotide sequences against a long reference sequence. BWA is slower than Bowtie, but allows indels in the alignment.
......@@ -11,7 +11,7 @@ The basic usage of BWA is:
{{< highlight bash >}}
$ bwa COMMAND [options]
{{< /highlight >}}
where **COMMAND** is one of the available BWA commands:
where **COMMAND** is one of the available BWA commands:
- **index**: index sequences in the FASTA format
- **mem**: BWA-MEM algorithm
......@@ -35,4 +35,5 @@ $ bwa COMMAND
{{< /highlight >}}
or check the [BWA manual] (http://bio-bwa.sourceforge.net/bwa.shtml).
The page [Running BWA Commands](running_bwa_commands) shows how to run BWA on HCC.
\ No newline at end of file
The page[Running BWA Commands](running_bwa_commands) shows how to run BWA on HCC.
......@@ -4,7 +4,7 @@ description = "How to run BWA commands on HCC resources"
weight = "10"
+++
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BWA Index:</span>
## BWA Index
The first step of using BWA is to make an index of the reference genome in fasta format. The basic usage of the **bwa index** is:
{{< highlight bash >}}
......@@ -12,8 +12,8 @@ $ bwa index [-a bwtsw|is] input_reference.fasta index_prefix
{{< /highlight >}}
where **input_reference.fasta** is an input file of the reference genome in fasta format, and **index_prefix** is the prefix of the generated index files. The option **-a** is required and can have two values: **bwtsw** (does not work for short genomes) and **is** (does not work for long genomes). Therefore, this value is chosen according to the length of the genome.
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BWA Mem:</span>
## BWA Mem
The **bwa mem** algorithm is one of the three algorithms provided by BWA. It performs local alignment and produces alignments for different part of the query sequence. The basic usage of **bwa mem** is:
{{< highlight bash >}}
......@@ -21,7 +21,7 @@ $ bwa mem index_prefix [input_reads.fastq|input_reads_pair_1.fastq input_reads_p
{{< /highlight >}}
where **index_prefix** is the index for the reference genome generated from **bwa index**, and **input_reads.fastq**, **input_reads_pair_1.fastq**, **input_reads_pair_2.fastq** are the input files of sequencing data that can be single-end or paired-end respectively. Additional **options** for **bwa mem** can be found in the BWA manual.
\\
Simple SLURM script for running **bwa mem** on Tusker with paired-end fastq input data, `index_prefix` as reference genome index, SAM output file and `8 CPUs` is shown below:
{{% panel header="`bwa_mem.submit`"%}}
{{< highlight bash >}}
......@@ -40,8 +40,8 @@ bwa mem index_prefix input_reads_pair_1.fastq input_reads_pair_2.fastq -t $SLURM
{{< /highlight >}}
{{% /panel %}}
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BWA Bwasw:</span>
## BWA Bwasw
The **bwa bwasw** algorithm is another algorithm provided by BWA. For input files with single-end reads it aligns the query sequences. For input files with paired-ends reads it performs paired-end alignment that only works for Illumina reads.
......@@ -50,16 +50,16 @@ An example of **bwa bwasw** for single-end input file `input-reads.fasta` in fas
$ bwa bwasw index_prefix input_reads.fasta -t $SLURM_NTASKS_PER_NODE > bwa_bwasw_alignments.sam
{{< /highlight >}}
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BWA Aln:</span>
## BWA Aln
The third BWA algorithm, **bwa aln**, aligns the input file of sequence data to the reference genome. In addition, there is an example of running **bwa aln** with single-end `input_reads.fasta` input file and `8 CPUs`:
{{< highlight bash >}}
$ bwa aln index_prefix input_reads.fasta -0 -t $SLURM_NTASKS_PER_NODE > bwa_aln_alignments.sai
{{< /highlight >}}
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BWA Samse and BWA Sampe:</span>
## BWA Samse and BWA Sampe
The command **bwa samse** uses the `bwa_aln_alignments.sai` output from **bwa aln** in order to generate SAM file from the alignments for single-end reads.
......@@ -77,32 +77,32 @@ $ bwa samse -f bwa_aln_alignments.sam index_prefix bwa_aln_alignments_pair_1.sai
{{< /highlight >}}
{{% /panel %}}
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BWA Fastmap:</span>
## BWA Fastmap
The command **bwa fastmap** identifies and outputs super-maximal exact matches (SMEMs). The basic usage of **bwa fastmap** is:
{{< highlight bash >}}
$ bwa fastmap index_prefix input_reads.fasta > bwa_fastmap.matches
{{< /highlight >}}
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BWA Pemerge:</span>
## BWA Pemerge
The command **bwa pemerge** merges overlapping paired ends and can print either only the merged reads or the unmerged ones. An example of **bwa pemerge** of `input_reads_pair_1.fastq` and `input_reads_pair_2.fastq` with `8 CPUs` and output file `output_reads_merged.fastq` that contains only the merged reads is shown below:
{{< highlight bash >}}
$ bwa pemerge -m input_reads_pair_1.fastq input_reads_pair_2.fastq -t $SLURM_NTASKS_PER_NODE > output_reads_merged.fastq
{{< /highlight >}}
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BWA Fa2pac:</span>
## BWA Fa2pac
The command **bwa fa2pac** converts fasta to pac files. The general usage of **bwa fa2pac** is:
{{< highlight bash >}}
$ bwa fa2pac input_reads.fasta pac_prefix
{{< /highlight >}}
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BWA Pac2bwt and BWA Pac2bwtgen:</span>
## BWA Pac2bwt and BWA Pac2bwtgen
The commands **bwa pac2bwt** and **bwa pac2bwtgen** convert pac to bwt files.
......@@ -118,24 +118,24 @@ $ bwa pac2bwtgen input_reads.pac output_reads.bwt
{{< /highlight >}}
{{% /panel %}}
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BWA Bwtupdate:</span>
## BWA Bwtupdate
The command **bwa bwtupdate** updates bwt files to the new format. The general usage of **bwa bwtupdate** is:
{{< highlight bash >}}
$ bwa bwtupdate input_reads.bwt
{{< /highlight >}}
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BWA Bwt2sa:</span>
## BWA Bwt2sa
The command **bwa bwt2sa** generates sa files from bwt and Occ files. The basic usage of **bwa bwt2sa** is:
{{< highlight bash >}}
$ bwa bwt2sa input_reads.bwt output_reads.sa
{{< /highlight >}}
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">Useful Information</span>
### Useful Information
In order to test the scalability of BWA (bwa/0.7) on Crane, we used two paired-end input fastq files, `large_1.fastq` and `large_2.fastq`, and one single-end input fasta file, `large.fasta`. Some statistics about the input files and the time and memory resources used by **bwa mem** are shown on the table below:
{{< readfile file="/static/html/bwa.html" >}}
\ No newline at end of file
{{< readfile file="/static/html/bwa.html" >}}
......@@ -4,15 +4,17 @@ description = "How to run Clustal Omega on HCC resources"
weight = "10"
+++
[Clustal Omega] (http://www.clustal.org/omega/) is a general purpose multiple sequence alignment (MSA) tool used mainly with protein, as well as DNA and RNA sequences. Clustal Omega is fast and scalable aligner that can align datasets of hundreds of thousands of sequences in reasonable time.
The general usage of Clustal Omega is:
{{< highlight bash >}}
$ clustalo -i input_file.fasta -o output_file.fasta [options]
{{< /highlight >}}
where **input_file.fasta** is the multiple sequence input file in `fasta` format, and **output_file.fasta** is the multiple sequence alignment output file in `fasta` format.
\\
Clustal Omega accepts 3 types of sequence input files:
- sequence file with aligned/unaligned sequences
......@@ -21,13 +23,13 @@ Clustal Omega accepts 3 types of sequence input files:
These input files must contain at least 2 sequences and must be in one of the following MSA file formats: `a2m`, `fa[sta]`, `clu[stal]`, `msf`, `phy[lip]`, `selex`, `st[ockholm]`, `vie[nna]`. Moreover, if not specified, the generated output file is in `fasta` format.
\\
More Clustal Omega options can be found by typing:
{{< highlight bash >}}
$ clustalo -h
{{< /highlight >}}
\\
Running Clustal Omega on Tusker with input file `input_reads.fasta` with `8 threads` and `10GB memory` is shown below:
{{% panel header="`clustal_omega.submit`"%}}
{{< highlight bash >}}
......@@ -54,13 +56,13 @@ $ clustalo -i input_reads.sto --dealign -v
{{< /highlight >}}
Clustal Omega will read the input file in Stockholm format, de-align the sequences, and then re-align them, printing progress report in meanwhile (**-v**). Because it is not specified, the output will be in the default `fasta` format.
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">Clustal Omega Output</span>
### Clustal Omega Output
The basic Clustal Omega output produces one alignment file in the specified output format. More intermediate outputs can be generated using specific Clustal Omega options, such as: **--distmat-out=<file>** (*pairwise distance matrix output file*) and **--guidetree-out=<file>** (*guide tree output file*).
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">Useful Information</span>
### Useful Information
In order to test the Clustal Omega performance on Tusker, we used three DNA and protein input fasta files, `data_1.fasta`, `data_2.fasta`, `data_3.fasta`. Some statistics about the input files and the time and memory resources used by Clustal Omega on Tusker are shown on the table below:
{{< readfile file="/static/html/clustal_omega.html" >}}
\ No newline at end of file
{{< readfile file="/static/html/clustal_omega.html" >}}
......@@ -9,6 +9,7 @@ weight = "10"
Although there is no difference between the available options for both TopHat and TopHat2 and the number of output files, TopHat2 incorporates many significant improvements to TopHat. The TopHat package at HCC supports both **tophat** and **tophat2**.
The basic usage of TopHat2 is:
{{< highlight bash >}}
$ [tophat|tophat2] [options] index_prefix [input_reads_pair_1.[fasta|fastq] input_reads_pair_2.[fasta|fastq] | input_reads.[fasta|fastq]]
......@@ -17,6 +18,7 @@ where **index_prefix** is the basename of the genome index to be searched. This
TopHat2 uses single or comma-separated list of paired-end and single-end reads in fasta or fastq format. The single-end reads need to be provided after the paired-end reads.
More advanced TopHat2 options can be found in [its manual] (https://ccb.jhu.edu/software/tophat/manual.shtml), or by typing:
{{< highlight bash >}}
$ tophat2 -h
......@@ -24,7 +26,7 @@ $ tophat2 -h
Prior running TopHat/TopHat2, an index from the reference genome should be built using Bowtie/Bowtie2. Moreover, TopHat2 requires both, the index file and the reference file, to be in the same directory. If the reference file is not available,TopHat2 reconstructs it in its initial step using the index file.
\\
An example of how to run TopHat2 on Tusker with paired-end fastq files `input_reads_pair_1.fastq` and `input_reads_pair_2.fastq`, reference index `index_prefix` and `8 CPUs` is shown below:
{{% panel header="`tophat2_alignment.submit`"%}}
{{< highlight bash >}}
......@@ -45,8 +47,8 @@ tophat2 -p $SLURM_NTASKS_PER_NODE index_prefix input_reads_pair_1.fastq input_re
TopHat2 generates its own output directory `tophat_output/` that contains multiple TopHat2 generated files.
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">TopHat2 Output</span>
### TopHat2 Output
TopHat2 produces number of files in its `tophat_out/` output directory. Some of the generated files are:
......@@ -56,4 +58,4 @@ TopHat2 produces number of files in its `tophat_out/` output directory. Some of
- **insertions.bed**: BED track of insertions reported by TopHat
- **deletions.bed**: BED track of deletions reported by TopHat
- **prep_reads.info**: statistics about the input sequencing data (min/max read length, number of reads)
- **align_summary.txt**: summary of the alignment counts (number of mapped reads, overall read mapping rate)
\ No newline at end of file
- **align_summary.txt**: summary of the alignment counts (number of mapped reads, overall read mapping rate)
......@@ -4,6 +4,7 @@ description = "How to use Biodata Module on HCC machines"
weight = "52"
+++
HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan), genome files, short read aligned indices etc. on both Tusker and Crane.
In order to use these resources, the "**biodata**" module needs to be loaded first.
For how to load module, please check [Module Commands](#module_commands).
......@@ -19,19 +20,20 @@ The major environment variables are:
**$GENOMES** - Directory containing all available genomes (multiple sources, builds possible
**$INDICES** - Directory containing indices for bowtie, bowtie2, bwa for all available genomes
**$UNIPROT** - Directory containing latest release of full UniProt database
\\
\\
\\
In order to check what genomes are available, you can type:
{{< highlight bash >}}
$ ls $GENOMES
{{< /highlight >}}
\\
In order to check what BLAST databases are available, you can just type:
{{< highlight bash >}}
$ ls $BLAST
{{< /highlight >}}
\\
An example of how to run Bowtie2 local alignment on Crane utilizing the default Horse, *Equus caballus* index (*BOWTIE2\_HORSE*) with paired-end fasta files and 8 CPUs is shown below:
{{% panel header="`bowtie2_alignment.submit`"%}}
{{< highlight bash >}}
......@@ -51,7 +53,8 @@ bowtie2 -x $BOWTIE2_HORSE -f -1 input_reads_pair_1.fasta -2 input_reads_pair_2.f
{{< /highlight >}}
{{% /panel %}}
\\
An example of BLAST run against the non-redundant nucleotide database available on Crane is provided below:
{{% panel header="`blastn_alignment.submit`"%}}
{{< highlight bash >}}
......@@ -74,6 +77,7 @@ cp /scratch/blast_nucleotide.results .
{{< /highlight >}}
{{% /panel %}}
The organisms and their appropriate environmental variables for all genomes and chromosome files, as well as for short read aligned indices are shown on the link below:
[Organisms](#organisms)
......@@ -3,15 +3,16 @@ title = "BamTools"
description = "How to use BamTools on HCC machines"
weight = "52"
+++
 
The SAM/BAM format is a standard format for short read alignments. While SAM is the plain-text version of the alignments, BAM is compressed, binary format of the alignments that is used for space-saving. BamTools is a toolkit for handling BAM files. BamTools provides a powerful suite of command-lines programs for manipulating and querying BAM files for data.
The basic usage of BamTools is:
{{< highlight bash >}}
$ bamtools COMMAND [options]
{{< /highlight >}}
where **COMMAND** is one of the following BamTools commands:
where **COMMAND** is one of the following BamTools commands:
- **convert**: Converts between BAM and a number of other formats
- **count**: Prints number of alignments in BAM file(s)
......@@ -27,10 +28,12 @@ where **COMMAND** is one of the following BamTools commands:
- **split**: Splits a BAM file on user-specified property, creating a new BAM output file for each value found
- **stats**: Prints some basic statistics from input BAM file(s)
For detailed description and more information on a specific command, just type:
{{< highlight bash >}}
$ bamtools help COMMAND
{{< /highlight >}}
or check the BamTools web, https://github.com/pezmaster31/bamtools/wiki.
The page [Running BamTools Commands](running_bamtools_commands) shows how to run BamTools on HCC.
\ No newline at end of file
The page [Running BamTools Commands](running_bamtools_commands) shows how to run BamTools on HCC.
......@@ -4,7 +4,8 @@ description = "How to run BamTools commands on HCC resources"
weight = "10"
+++
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BamTools Convert:</span>
## BamTools Convert
One of the most frequently used BamTools command is **convert**.
......@@ -14,6 +15,7 @@ $ bamtools convert -format [bed|fasta|fastq|json|pileup|sam|yaml] -in input_alig
{{< /highlight >}}
where the option **-format** specifies the type of the output file, **input_alignments.bam** is the input BAM file, and **-out** defines the name and the type of the converted file.
Running BamTools **convert** on Tusker with input file `input_alignments.bam` and output file `output_reads.fastq` is shown below:
{{% panel header="`bamtools_convert.submit`"%}}
{{< highlight bash >}}
......@@ -34,8 +36,8 @@ bamtools convert -format fastq -in input_alignments.bam -out output_reads.fastq
All BamTools commands are single threaded, and therefore both `#SBATCH --nodes` and `#SBATCH --ntasks-per-node` are set to **1**.
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BamTools Count:</span>
## BamTools Count
The basic usage of the BamTools **count** is:
{{< highlight bash >}}
......@@ -43,8 +45,8 @@ $ bamtools count -in input_alignments.bam
{{< /highlight >}}
The command **bamtools count** outputs the total number of alignments in the BAM file.
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BamTools Coverage:</span>
## BamTools Coverage
The basic usage of the BamTools **coverage** is:
{{< highlight bash >}}
......@@ -52,8 +54,8 @@ $ bamtools coverage -in input_alignments.bam -out output_reads_coverage.txt
{{< /highlight >}}
The command **bamtools coverage **prints the coverage data for a single BAM file.
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BamTools Filter:</span>
## BamTools Filter
The basic usage of the BamTools **filter** is:
{{< highlight bash >}}
......@@ -61,8 +63,8 @@ $ bamtools filter -in input_alignments.bam -out output_alignments_filtered.bam -