Skip to content
Snippets Groups Projects

Add part 1 for bioinformatics pages

Merged Natasha Pavlovikj requested to merge bioinformatics-part1 into master
3 files
+ 156
217
Compare changes
  • Side-by-side
  • Inline
Files
3
1. [HCC-DOCS](index.html)
2. [HCC-DOCS Home](HCC-DOCS-Home_327685.html)
3. [HCC Documentation](HCC-Documentation_332651.html)
4. [Running Applications](Running-Applications_7471153.html)
5. [Bioinformatics Tools](Bioinformatics-Tools_8193279.html)
+++
title = "Biodata Module"
description = "How to use Biodata Module on HCC machines"
weight = "52"
+++
<span id="title-text"> HCC-DOCS : Biodata Module </span>
========================================================
HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan), genome files, short read aligned indices etc. on both Tusker and Crane.
In order to use these resources, the "**biodata**" module needs to be loaded first.
For how to load module, please check [Module Commands](#module_commands).
Created by <span class="author"> Adam Caprez</span>, last modified on
Feb 22, 2017
| Name | Version | Resource |
|---------|---------|----------|
| biodata | 1.0 | Tusker |
| Name | Version | Resource |
|---------|---------|----------|
| biodata | 1.0 | Crane |
HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan),
genome files, short read aligned indices etc. on both Tusker and
Crane.
In order to use these resources, the "**biodata**" module needs to be
loaded first.
For how to load module, please check [Module
Commands](Module-Commands_332464.html).
Loading the "**biodata**" module will pre-set many environment
variables, but most likely you will only need a subset of them.
Environment variables can be used in your command or script by prefixing
a **$** sign to the name.
Loading the "**biodata**" module will pre-set many environment variables, but most likely you will only need a subset of them. Environment variables can be used in your command or script by prefixing `$` to the name.
The major environment variables are:
**$DATA** - main directory
**$BLAST** - Directory containing all available BLAST (nucleotide and
protein) databases
**$BLAST** - Directory containing all available BLAST (nucleotide and protein) databases
**$KEGG** - KEGG database main entry point (requires license)
**$PANTHER** - PANTHER database main entry point (latest)
**$IPR** - InterProScan database main entry point (latest)
**$GENOMES** - Directory containing all available genomes (multiple
sources, builds possible
**$INDICES** - Directory containing indices for bowtie, bowtie2, bwa for
all available genomes
**$UNIPROT** - Directory containing latest release of full UniProt
database
In order to check what genomes are available, you can just type:
**Check available GENOMES**
``` syntaxhighlighter-pre
ls $GENOMES
```
**$GENOMES** - Directory containing all available genomes (multiple sources, builds possible
**$INDICES** - Directory containing indices for bowtie, bowtie2, bwa for all available genomes
**$UNIPROT** - Directory containing latest release of full UniProt database
\\
\\
\\
In order to check what genomes are available, you can type:
{{< highlight bash >}}
$ ls $GENOMES
{{< /highlight >}}
\\
In order to check what BLAST databases are available, you can just type:
**Check available BLAST databases**
``` syntaxhighlighter-pre
ls $BLAST
```
An example of how to run Bowtie2 local alignment on Tusker utilizing the
default Horse, *Equus caballus* index (*BOWTIE2\_HORSE*) with paired-end
fasta files and 8 CPUs is shown below:
**bowtie2\_alignment.submit**
\#!/bin/sh
\#SBATCH --job-name=Bowtie2
\#SBATCH --nodes=1
\#SBATCH --ntasks-per-node=8
\#SBATCH --time=168:00:00
\#SBATCH --mem=50gb
\#SBATCH --output=Bowtie2.%J.out
\#SBATCH --error=Bowtie2.%J.err
module load biodata/1.0
{{< highlight bash >}}
$ ls $BLAST
{{< /highlight >}}
\\
An example of how to run Bowtie2 local alignment on Crane utilizing the default Horse, *Equus caballus* index (*BOWTIE2\_HORSE*) with paired-end fasta files and 8 CPUs is shown below:
{{% panel header="`bowtie2_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Bowtie2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Bowtie2.%J.out
#SBATCH --error=Bowtie2.%J.err
module load bowtie/2.2
bowtie2 -x $BOWTIE2\_HORSE -f -1 input\_reads\_pair\_1.fasta -2
input\_reads\_pair\_2.fasta -S bowtie2\_alignments.sam --local -p
$SLURM\_NTASKS\_PER\_NODE
module load biodata
bowtie2 -x $BOWTIE2_HORSE -f -1 input_reads_pair_1.fasta -2 input_reads_pair_2.fasta -S bowtie2_alignments.sam --local -p $SLURM_NTASKS_PER_NODE
{{< /highlight >}}
{{% /panel %}}
\\
An example of BLAST run against the non-redundant nucleotide database available on Crane is provided below:
{{% panel header="`blastn_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=BlastN
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=BlastN.%J.out
#SBATCH --error=BlastN.%J.err
module load blast/2.7
module load biodata
cp $BLAST/nt.* /scratch
cp input_reads.fasta /scratch
blastn -db /scratch/nt -query /scratch/input_reads.fasta -out /scratch/blast_nucleotide.results
cp /scratch/blast_nucleotide.results .
{{< /highlight >}}
{{% /panel %}}
An example of BLAST run against the yeast nucleotide database available
on Tusker is provided below:
**blastn\_alignment.submit**
\#!/bin/sh
\#SBATCH --job-name=BlastN
\#SBATCH --nodes=1
\#SBATCH --ntasks-per-node=8
\#SBATCH --time=168:00:00
\#SBATCH --mem=50gb
\#SBATCH --output=BlastN.%J.out
\#SBATCH --error=BlastN.%J.err
module load biodata/1.0
module load blast/2.2
cp $BLAST/yeast.nt.\* /tmp
cp yeast.query /tmp
blastn -db /tmp/yeast.nt -query /tmp/yeast.query -out
/tmp/blast\_nucleotide.results
cp /tmp/blast\_nucleotide.results .
The organisms and their appropriate environmental variables for all
genomes and chromosome files, as well as for short read aligned indices
are shown on the link below:
Attachments:
------------
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[cb\_blastn\_biodata.xsl](attachments/15171887/15171888.xsl)
(application/octet-stream)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[cb\_bowtie2\_biodata.xsl](attachments/15171887/15171889.xsl)
(application/octet-stream)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[crane\_biodata\_version.xsl](attachments/15171887/15171890.xsl)
(application/octet-stream)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[crane\_modules.xml](attachments/15171887/15171891.xml)
(application/octet-stream)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[tusker\_biodata\_version.xsl](attachments/15171887/15171892.xsl)
(application/octet-stream)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[tusker\_modules.xml](attachments/15171887/15171893.xml)
(application/octet-stream)
The organisms and their appropriate environmental variables for all genomes and chromosome files, as well as for short read aligned indices are shown on the link below:
[Organisms](#organisms)
Loading