Skip to content
Snippets Groups Projects

Add bio pages part 3

Merged Natasha Pavlovikj requested to merge bioinformatics-part3 into master
15 files
+ 920
2598
Compare changes
  • Side-by-side
  • Inline
Files
15
1. [HCC-DOCS](index.html)
2. [HCC-DOCS Home](HCC-DOCS-Home_327685.html)
3. [HCC Documentation](HCC-Documentation_332651.html)
4. [Running Applications](Running-Applications_7471153.html)
5. [Bioinformatics Tools](Bioinformatics-Tools_8193279.html)
6. [De Novo Assembly Tools](De-Novo-Assembly-Tools_8193280.html)
<span id="title-text"> HCC-DOCS : Trinity </span>
=================================================
Created by <span class="author"> Adam Caprez</span>, last modified by
<span class="editor"> Natasha Pavlovikj</span> on Feb 26, 2018
| Name | Version | Resource |
|---------|---------------|----------|
| trinity | r2013-02-25 | Tusker |
| trinity | r2013-11-10 | Tusker |
| trinity | r2014-04-13p1 | Tusker |
| | | |
|---------|---------------|-------|
| trinity | r2013-11-10 | Crane |
| trinity | r2014-04-13p1 | Crane |
+++
title = "Trinity"
description = "How to use Trinity on HCC machines"
weight = "52"
+++
Trinity
(<a href="http://trinityrnaseq.sourceforge.net/" class="external-link">http://trinityrnaseq.sourceforge.net/</a>)
is a method for efficient and robust de novo reconstruction of
transcriptomes from RNA-Seq data. Trinity combines three independent
software modules: Inchworm, Chrysalis, and Butterfly. All these modules
can be applied sequentially to process large RNA-Seq datasets.
[Trinity] (https://github.com/trinityrnaseq/trinityrnaseq/wiki) is a method for efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: `Inchworm`, `Chrysalis`, and `Butterfly`. All these modules can be applied sequentially to process large RNA-Seq datasets.
The basic usage of Trinity is:
{{< highlight bash >}}
$ Trinity --seqType [fa|fq] --JM <jellyfish_memory> --left input_reads_pair_1.[fa|fq] --right input_reads_pair_2.[fa|fq] [options]
{{< /highlight >}}
where **input_reads_pair_1.[fa|fq]** and **input_reads_pair_2.[fa|fq]** are the input paired-end files of sequence reads in fasta/fastq format, and **--seqType** is the type of these input reads. The option **--JM** defines the number of GB of system memory required for k-mer counting by jellyfish.
**General Trinity Usage**
``` syntaxhighlighter-pre
Trinity --seqType [fa|fq] --JM <jellyfish_memory> --left input_reads_pair_1.[fa|fq] --right input_reads_pair_2.[fa|fq] [options]
```
Additional Trinity **options** can be found in the Trinity website, or by typing:
{{< highlight bash >}}
$ Trinity
{{< /highlight >}}
where **input\_reads\_pair\_1.\[fa\|fq\]**
and **input\_reads\_pair\_2.\[fa\|fq\]** are the input paired-end files
of sequence reads in fasta/fastq format, and **--seqType** is the type
of these input reads. The option **--JM** defines the number of GB of
system memory required for k-mer counting by jellyfish. Additional
Trinity **options** can be found in the Trinity website, or by typing:
Running the Trinity pipeline from beginning to end on large datasets may exceed the walltime limit for a single job. Therefore, Trinity provides a mechanism to run the workflow in four separate steps, where each step resumes from the previous one. The same Trinity command and options are run for each step, with an additional option that is included for the different steps. On the last step, the Trinity command is run as normal.
**Additional Trinity Options**
``` syntaxhighlighter-pre
[<username>@login.tusker ~]$ Trinity
```
Running the Trinity pipeline from beginning to end on large datasets may
exceed the walltime limit for a single job. Therefore, Trinity provides
a mechanism to run the workflow in four separate steps, where each step
resumes from the previous one. The same Trinity command and options are
run for each step, with difference of an additional option that is
included for the different steps. On the last step, the Trinity command
is run as normal.
**Step 1:**
**Trinity Step 1 Options**
``` syntaxhighlighter-pre
{{% panel theme="info" header="Step 1 Options" %}}
{{< highlight bash >}}
Trinity.pl [options] --no_run_chrysalis
```
**Step 2: **
{{< /highlight >}}
{{% /panel %}}
**Trinity Step 2 Options**
``` syntaxhighlighter-pre
{{% panel theme="info" header="Step 2 Options" %}}
{{< highlight bash >}}
Trinity.pl [options] --no_run_quantifygraph
```
**Step 3:**
**Trinity Step 3 Options**
{{< /highlight >}}
{{% /panel %}}
``` syntaxhighlighter-pre
{{% panel theme="info" header="Step 3 Options" %}}
{{< highlight bash >}}
Trinity.pl [options] --no_run_butterfly
```
{{< /highlight >}}
{{% /panel %}}
**Step 4:**
**Trinity Step 4 Options**
``` syntaxhighlighter-pre
{{% panel theme="info" header="Step 4 Options" %}}
{{< highlight bash >}}
Trinity.pl [options]
```
Each step may be run as its own job, providing a workaround for the
single job walltime limit. The following page describes how to run each
step of Trinity as a single job under the SLURM scheduler on HCC:
**Useful Information**
In order to test the TRINITY (trinity/r2014-04-13p1) performance on
Tusker, we used three paired-end input fastq files: **small\_1.fastq**,
**small\_2.fastq**, **medium\_1.fastq**, **medium\_2.fastq**,
**large\_1.fastq**, **large\_2.fastq. **Some statistics about the input
files and the time and memory resources required for TRINITY are shown
on the table below:
**total \# of sequences**
**total \# of bases**
**total size in MB**
**Trinity Step 1 required time**
**Trinity Step 1 required memory**
Trinity Step 2 required time
Trinity Step 2 required memory
Trinity Step 3 required time
Trinity Step 3 required memory
Trinity Step 4 required time
Trinity Step 4 required memory
\# of used CPUs
**small\_1.fastq**
50,121
2,506,050
8.010 MB
\~ 1 minute
\~ 35 GB
\~ 0.01 hours
\~ 0.6 GB
\~ 0.2 minutes
\~ 0.07 GB
\~ 0.008 hours
\~ 0.8 GB
8
**small\_2.fastq**
50,121
2,506,050
8.010 MB
**medium\_1.fastq**
786,742
59,792,392
152 MB
\~ 3 minutes
\~ 68 GB
\~ 0.1 hours
\~ 3 GB
\~ 0.8 minutes
\~ 0.6 GB
\~ 0.3 hours
\~ 5 GB
8
**medium\_2.fastq**
786,742
59,792,392
152 MB
**large\_1.fastq**
10,174,715
1,027,646,215
3,376 MB
\~ 58 minutes
\~ 80 GB
\~ 5 hours
\~ 30 GB
\~ 35 minutes
\~ 8 GB
\~ 13 hours
\~ 30 GB
8
**large\_2.fastq**
10,174,715
1,027,646,215
3,376 MB
Memory Requirement
<span
class="aui-icon aui-icon-small aui-iconfont-warning confluence-information-macro-icon"></span>
<span style="color: rgb(0,0,0);">The Inchworm (step 1) and Chrysalis
(step 2) steps can be memory intensive. A basic recommendation is to
have **1GB of RAM per 1M** </span><span
style="color: rgb(0,0,0);">**\~76 base Illumina paired-end
reads**.</span>
{{< /highlight >}}
{{% /panel %}}
Attachments:
------------
Each step may be run as its own job, providing a workaround for the single job walltime limit. To see how to run each step of Trinity as a single job under the SLURM scheduler on HCC, please check:
{{% children %}}
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[crane\_modules.xml](attachments/8193286/8127532.xml)
(application/octet-stream)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[crane\_trinity\_version.xsl](attachments/8193286/8127533.xsl)
(application/octet-stream)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[tusker\_modules.xml](attachments/8193286/8127534.xml)
(application/octet-stream)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[tusker\_trinity\_version.xsl](attachments/8193286/8127535.xsl)
(application/octet-stream)
\\
<span style="color: rgb(0,0,0);font-size: 20.0px;line-height: 1.5;">Useful Information</span>
In order to test the Trinity (trinity/r2014-04-13p1) performance on Tusker, we used three paired-end input fastq files, `small_1.fastq` and `small_2.fastq`, `medium_1.fastq` and `medium_2.fastq`, and `large_1.fastq` and `large_2.fastq`. Some statistics about the input files and the time and memory resources used by Trinity on Tusker are shown in the table below:
{{< readfile file="/static/html/trinity.html" >}}
{{% notice tip %}}
The Inchworm (step 1) and Chrysalis (step 2) steps can be memory intensive. A basic recommendation is to have **1GB of RAM per 1M ~76 base Illumina paired-end reads**.
{{% /notice %}}
\ No newline at end of file
Loading