Skip to content
Snippets Groups Projects
Commit 4a59f001 authored by npavlovikj's avatar npavlovikj
Browse files

add bio pages part 4

parent ba5c0711
No related branches found
No related tags found
1 merge request!29Add bio pages part 4
Showing
with 952 additions and 10932 deletions
+++
title = "Bioinformatics Tools"
description = "How to use various bioinformatics tools on HCC machines"
weight = "52"
+++
<span style="color: rgb(0,0,0);">The following is a categorized list of
bioinformatics tools available on HCC. Each page contains summary of the
tool, information about the HCC resources that have the specific
tool, links to user documentation, as well as example SLURM submit
scripts. More detailed information about submitting SLURM jobs and
checking job status on HCC can be
found [here](Submitting-Jobs_332222.html).</span>
The following is a categorized list of bioinformatics tools available on HCC. Each page contains summary of the tool, information about the HCC resources that have the specific tool, links to user documentation, as well as example SLURM submit scripts.
<span style="color: rgb(0,0,0);"> </span>
More detailed information about submitting SLURM jobs and checking job status on HCC can be found [here](../../submitting_jobs)
{{% children %}}
+++
title = "Alignment Tools"
description = "How to use various alignment tools on HCC machines"
weight = "52"
+++
1. [HCC-DOCS](index.html)
2. [HCC-DOCS Home](HCC-DOCS-Home_327685.html)
3. [HCC Documentation](HCC-Documentation_332651.html)
4. [Running Applications](Running-Applications_7471153.html)
5. [Bioinformatics Tools](Bioinformatics-Tools_8193279.html)
<span id="title-text"> HCC-DOCS : Alignment Tools </span>
=========================================================
Created by <span class="author"> Adam Caprez</span> on Sep 04, 2014
{{% children %}}
\ No newline at end of file
1. [HCC-DOCS](index.html)
2. [HCC-DOCS Home](HCC-DOCS-Home_327685.html)
3. [HCC Documentation](HCC-Documentation_332651.html)
4. [Running Applications](Running-Applications_7471153.html)
5. [Bioinformatics Tools](Bioinformatics-Tools_8193279.html)
6. [Alignment Tools](Alignment-Tools_8193288.html)
+++
title = "BLAT"
description = "How to run BLAT on HCC resources"
weight = "10"
+++
<span id="title-text"> HCC-DOCS : BLAT </span>
==============================================
Created by <span class="author"> Adam Caprez</span>, last modified by
<span class="editor"> Natasha Pavlovikj</span> on Dec 12, 2016
| Name | Version | Resource |
|------|---------|----------|
| blat | 35x1 | Tusker |
| | | |
|------|------|-------|
| blat | 35x1 | Crane |
<span style="line-height: 1.4285715;">
</span>
<span style="line-height: 1.4285715;">BLAT is a pairwise alignment tool
similar to BLAST. It is more accurate and about 500 times faster than
the existing tools for mRNA/DNA alignments and it is about 50 times
faster with protein/protein alignments. BLAT accepts short and long
query and database sequences as input files.</span>
BLAT is a pairwise alignment tool similar to BLAST. It is more accurate and about 500 times faster than the existing tools for mRNA/DNA alignments and it is about 50 times faster with protein/protein alignments. BLAT accepts short and long query and database sequences as input files.
The basic usage of BLAT is:
**General BLAT Usage**
``` syntaxhighlighter-pre
blat database query output_alignment.txt [options]
```
where **database** is the name of the database used for the alignment,
**query** is the name of the input file of sequence data in
fasta/nib/2bit format, and **output\_alignment.txt** is the output
alignment file. Additional parameters for BLAT alignment can be found in
the
manual: <a href="http://genome.ucsc.edu/goldenPath/help/blatSpec.html" class="external-link">http://genome.ucsc.edu/goldenPath/help/blatSpec.html</a>,
or by using
**Additional BLAT Options**
``` syntaxhighlighter-pre
[<username>@login.tusker~]$ blat
```
Running BLAT on Tusker with query file **input\_reads.fasta** and
database **db.fa** is shown below:
**blat\_alignment.submit**
\#!/bin/sh
\#SBATCH --job-name=Blat
\#SBATCH --nodes=1
\#SBATCH --ntasks-per-node=1
\#SBATCH --time=168:00:00
\#SBATCH --mem=50gb
\#SBATCH --output=Blat.%J.out
\#SBATCH --error=Blat.%J.err
| |
|-----------------------|
| module load blat/35x1 |
blat db.fa input\_reads.fasta output\_alignment.txt
Although BLAT is a single threaded program (**\#SBATCH --nodes=1**,
**\#SBATCH --ntasks-per-node=1**) it is still much faster than the other
alignment tools.
**BLAT Output**
BLAT output is a list containing the following information: *the score
of the alignment*, *the region of query sequence that matches the
database sequence*, *the size of the query sequence*, *the level of
identity as a percentage of the alignment* and *the chromosome and
position that the query sequence maps to*.
Attachments:
------------
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[cb\_blat\_module.xsl](attachments/8193292/8127546.xsl)
(application/octet-stream)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[crane\_blat\_version.xsl](attachments/8193292/8127547.xsl)
(application/octet-stream)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[crane\_modules.xml](attachments/8193292/8127548.xml)
(application/octet-stream)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[tusker\_blat\_version.xsl](attachments/8193292/8127549.xsl)
(application/octet-stream)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[tusker\_modules.xml](attachments/8193292/8127550.xml)
(application/octet-stream)
{{< highlight bash >}}
$ blat database query output_alignment.txt [options]
{{< /highlight >}}
where **database** is the name of the database used for the alignment, **query** is the name of the input file of sequence data in `fasta/nib/2bit` format, and **output_alignment.txt** is the output alignment file.
Additional parameters for BLAT alignment can be found in the [manual] (http://genome.ucsc.edu/FAQ/FAQblat), or by using:
{{< highlight bash >}}
$ blat
{{< /highlight >}}
\\
Running BLAT on Tusker with query file `input_reads.fasta` and database `db.fa` is shown below:
{{% panel header="`blat_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Blat
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=50gb
#SBATCH --output=Blat.%J.out
#SBATCH --error=Blat.%J.err
module load blat/35x1
blat db.fa input_reads.fasta output_alignment.txt
{{< /highlight >}}
{{% /panel %}}
Although BLAT is a single threaded program (`#SBATCH --nodes=1`, `#SBATCH --ntasks-per-node=1`) it is still much faster than the other alignment tools.
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">BLAT Output</span>
BLAT output is a list containing the following information:
- the score of the alignment
- the region of query sequence that matches the database sequence
- the size of the query sequence
- the level of identity as a percentage of the alignment
- the chromosome and position that the query sequence maps to
\ No newline at end of file
1. [HCC-DOCS](index.html)
2. [HCC-DOCS Home](HCC-DOCS-Home_327685.html)
3. [HCC Documentation](HCC-Documentation_332651.html)
4. [Running Applications](Running-Applications_7471153.html)
5. [Bioinformatics Tools](Bioinformatics-Tools_8193279.html)
6. [Alignment Tools](Alignment-Tools_8193288.html)
+++
title = "Clustal Omega"
description = "How to run Clustal Omega on HCC resources"
weight = "10"
+++
<span id="title-text"> HCC-DOCS : Clustal Omega </span>
=======================================================
Created by <span class="author"> Adam Caprez</span>, last modified by
<span class="editor"> Natasha Pavlovikj</span> on Dec 12, 2016
| Name | Version | Resource |
|---------------|---------|----------|
| clustal-omega | 1.2 | Tusker |
| | | |
|---------------|-----|-------|
| clustal-omega | 1.2 | Crane |
Clustal Omega
(<a href="http://www.clustal.org/omega/" class="external-link">http://www.clustal.org/omega/</a>)
is a general purpose multiple sequence alignment (MSA) tool used mainly
with protein, as well as DNA and RNA sequences. Clustal Omega is fast
and scalable aligner that can align datasets of hundreds of thousands of
sequences in reasonable time.
[Clustal Omega] (http://www.clustal.org/omega/) is a general purpose multiple sequence alignment (MSA) tool used mainly with protein, as well as DNA and RNA sequences. Clustal Omega is fast and scalable aligner that can align datasets of hundreds of thousands of sequences in reasonable time.
The general usage of Clustal Omega is:
{{< highlight bash >}}
$ clustalo -i input_file.fasta -o output_file.fasta [options]
{{< /highlight >}}
where **input_file.fasta** is the multiple sequence input file in `fasta` format, and **output_file.fasta** is the multiple sequence alignment output file in `fasta` format.
**General Clustal Omega Usage**
``` syntaxhighlighter-pre
clustalo -i input_file.fasta -o output_file.fasta [options]
```
where **input\_file.fasta** is the multiple sequence input file in
*fasta* format, and **output\_file.fasta** is the multiple sequence
alignment output file in *fasta* format.
\\
Clustal Omega accepts 3 types of sequence input files:
- sequence file with aligned/unaligned sequences
<!-- -->
- sequence file with aligned/unaligned sequences
- multiple alignment in a file/profile of aligned sequences
- Hidden Markov Model (HMM)
- multiple alignment in a file/profile of aligned sequences
<!-- -->
- Hidden Markov Model (HMM)
These input files must contain at least 2 sequences and must be in one
of the following MSA file formats: **a2m**, **fa\[sta\]**,
**clu\[stal\]**, **msf**, **phy\[lip\]**, **selex**, **st\[ockholm\]**,
**vie\[nna\]**. Moreover, if not specified, the generated output file is
in *fasta* format.
These input files must contain at least 2 sequences and must be in one of the following MSA file formats: `a2m`, `fa[sta]`, `clu[stal]`, `msf`, `phy[lip]`, `selex`, `st[ockholm]`, `vie[nna]`. Moreover, if not specified, the generated output file is in `fasta` format.
\\
More Clustal Omega options can be found by typing:
**Additional Clustal Omega Options**
``` syntaxhighlighter-pre
[<username>@login.tusker~]$ clustalo -h
```
Running Clustal Omega on Tusker with input
file **input\_reads.fasta** with **8 threads** and **10GB memory** is
shown below:
**clustal\_omega.submit**
\#!/bin/sh
\#SBATCH --job-name=Clustal\_Omega
\#SBATCH --nodes=1
\#SBATCH --ntasks-per-node=8
\#SBATCH --time=10:00:00
\#SBATCH --mem=10gb
\#SBATCH --output=ClustalOmega.%J.out
\#SBATCH --error=ClustalOmega.%J.err
| |
|-------------------------------|
| module load clustal-omega/1.2 |
clustalo -i input\_reads.fasta -o output\_msa.sto --outfmt=st
--threads=$SLURM\_NTASKS\_PER\_NODE
The output file **output\_msa.sto** contains the resulting multiple
sequence alignments in Stockholm format (**--outfmt=st**).
{{< highlight bash >}}
$ clustalo -h
{{< /highlight >}}
\\
Running Clustal Omega on Tusker with input file `input_reads.fasta` with `8 threads` and `10GB memory` is shown below:
{{% panel header="`clustal_omega.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Clustal_Omega
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=10:00:00
#SBATCH --mem=10gb
#SBATCH --output=ClustalOmega.%J.out
#SBATCH --error=ClustalOmega.%J.err
module load clustal-omega/1.2
clustalo -i input_reads.fasta -o output_msa.sto --outfmt=st --threads=$SLURM_NTASKS_PER_NODE
{{< /highlight >}}
{{% /panel %}}
The output file `output_msa.sto` contains the resulting multiple sequence alignments in Stockholm format (**--outfmt=st**).
Moreover, if you change the command above with:
{{< highlight bash >}}
$ clustalo -i input_reads.sto --dealign -v
{{< /highlight >}}
Clustal Omega will read the input file in Stockholm format, de-align the sequences, and then re-align them, printing progress report in meanwhile (**-v**). Because it is not specified, the output will be in the default `fasta` format.
**Clustal Omega with De-align Option**
``` syntaxhighlighter-pre
clustalo -i input_reads.sto --dealign -v
```
Clustal Omega will read the input file in Stockholm format, de-align the
sequences, and then re-align them, printing progress report in meanwhile
(**-v**). Because it is not specified, the output will be in the default
**fasta** format.
**Clustal Omega Output**
The basic Clustal Omega output produces one alignment file in the
specified output format. More intermediate outputs can be generated
using specific Clustal Omega options, such
as: **--distmat-out=&lt;file&gt;** (*pairwise distance matrix output
file*) and **--guidetree-out=&lt;file&gt;** (*guide tree output file*).
**
Useful Information**
In order to test the Clustal Omega performance on Tusker, we used three
DNA and protein input fasta files: **data\_1. fasta, data\_2. fasta,
data\_3.fasta**. Some statistics about the input files and the time and
memory resources required for Clustal Omega are shown on the table
below:
<table style="width:100%;">
<colgroup>
<col style="width: 14%" />
<col style="width: 14%" />
<col style="width: 14%" />
<col style="width: 14%" />
<col style="width: 14%" />
<col style="width: 14%" />
<col style="width: 14%" />
</colgroup>
<thead>
<tr class="header">
<th> </th>
<th><p><strong>total # of sequences</strong></p></th>
<th><p><strong>average sequence length</strong></p></th>
<th><p><strong>total size in MB</strong></p></th>
<th><p><strong>Clustal Omega required time</strong></p></th>
<th><p><strong>Clustal Omega required memory</strong></p></th>
<th># of used CPUs</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><p><strong>data_1.fasta</strong></p></td>
<td><p>1,200</p></td>
<td><p>510.17</p></td>
<td><p>641 KB</p></td>
<td><p>~ 5 minutes</p></td>
<td><span>~ 65 MB</span></td>
<td>8</td>
</tr>
<tr class="even">
<td><p><strong>data_2.fasta</strong></p></td>
<td><p>5,715</p></td>
<td><p>174.20</p></td>
<td><p>1,100 KB</p></td>
<td>~ 5 minutes</td>
<td><p>~ 140 MB</p></td>
<td><p>8</p></td>
</tr>
<tr class="odd">
<td><p><strong>data_3.fasta</strong></p></td>
<td><p>93,675</p></td>
<td><p>94.29</p></td>
<td><p>11,000 KB</p></td>
<td><p>~ 30 minutes</p></td>
<td><p>~ 2 GB</p></td>
<td><p>8</p></td>
</tr>
</tbody>
</table>
Attachments:
------------
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">Clustal Omega Output</span>
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[crane\_clustal\_omega\_version.xsl](attachments/9470379/9863812.xsl)
(application/octet-stream)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[cb\_clustal\_omega\_module.xsl](attachments/9470379/9863813.xsl)
(application/octet-stream)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[tusker\_clustal\_omega\_version.xsl](attachments/9470379/9863814.xsl)
(application/octet-stream)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[crane\_modules.xml](attachments/9470379/9863815.xml)
(application/octet-stream)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[tusker\_modules.xml](attachments/9470379/9863816.xml)
(application/octet-stream)
The basic Clustal Omega output produces one alignment file in the specified output format. More intermediate outputs can be generated using specific Clustal Omega options, such as: **--distmat-out=<file>** (*pairwise distance matrix output file*) and **--guidetree-out=<file>** (*guide tree output file*).
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">Useful Information</span>
In order to test the Clustal Omega performance on Tusker, we used three DNA and protein input fasta files, `data_1.fasta`, `data_2.fasta`, `data_3.fasta`. Some statistics about the input files and the time and memory resources used by Clustal Omega on Tusker are shown on the table below:
{{< readfile file="/static/html/clustal_omega.html" >}}
\ No newline at end of file
+++
title = "Data Manipulation Tools"
description = "How to use data manipulation tools on HCC machines"
weight = "52"
+++
1. [HCC-DOCS](index.html)
2. [HCC-DOCS Home](HCC-DOCS-Home_327685.html)
3. [HCC Documentation](HCC-Documentation_332651.html)
4. [Running Applications](Running-Applications_7471153.html)
5. [Bioinformatics Tools](Bioinformatics-Tools_8193279.html)
<span id="title-text"> HCC-DOCS : Data Manipulation Tools </span>
=================================================================
Created by <span class="author"> Adam Caprez</span> on Sep 04, 2014
{{% children %}}
\ No newline at end of file
......@@ -35,7 +35,7 @@ A simple SLURM script to run Oases on the Velvet output stored in `output_direct
#SBATCH --output=Oases.%J.out
#SBATCH --error=Oases.%J.err
module load oases/0.2.8
module load oases/0.2
oases output_directory/ -min_trans_lgth 200
{{< /highlight >}}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment