Natasha Pavlovikj · 03656231
--- a/content/guides/running_applications/bioinformatics_tools/biodata_module/_index.md

+ 66

− 136
+++ b/content/guides/running_applications/bioinformatics_tools/biodata_module/_index.md

+ 66

− 136
-1.  [HCC-DOCS](index.html)
-2.  [HCC-DOCS Home](HCC-DOCS-Home_327685.html)
-3.  [HCC Documentation](HCC-Documentation_332651.html)
-4.  [Running Applications](Running-Applications_7471153.html)
-5.  [Bioinformatics Tools](Bioinformatics-Tools_8193279.html)
+++
+title = "Biodata Module"
+description = "How to use Biodata Module on HCC machines"
+weight = "52"
+++

-<span id="title-text"> HCC-DOCS : Biodata Module </span>
-========================================================
+HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan), genome files, short read aligned indices etc. on both Tusker and Crane.  
+In order to use these resources, the "**biodata**" module needs to be loaded first.  
+For how to load module, please check [Module Commands](#module_commands).

-Created by <span class="author"> Adam Caprez</span>, last modified on
-Feb 22, 2017
-
-| Name    | Version | Resource |
-|---------|---------|----------|
-| biodata | 1.0     | Tusker   |
-
-| Name    | Version | Resource |
-|---------|---------|----------|
-| biodata | 1.0     | Crane    |
-
-  
-HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan),
-genome files, short read aligned indices etc. on both Tusker and
-Crane.  
-In order to use these resources, the "**biodata**" module needs to be
-loaded first.  
-For how to load module, please check [Module
-Commands](Module-Commands_332464.html). 
-
-Loading the "**biodata**" module will pre-set many environment
-variables, but most likely you will only need a subset of them.  
-Environment variables can be used in your command or script by prefixing
-a **$** sign to the name.
+Loading the "**biodata**" module will pre-set many environment variables, but most likely you will only need a subset of them. Environment variables can be used in your command or script by prefixing `$` to the name.

 The major environment variables are:  
 **$DATA** - main directory  
-**$BLAST** - Directory containing all available BLAST (nucleotide and
-protein) databases  
+**$BLAST** - Directory containing all available BLAST (nucleotide and protein) databases  
 **$KEGG** - KEGG database main entry point (requires license)  
 **$PANTHER** - PANTHER database main entry point (latest)  
 **$IPR** - InterProScan database main entry point (latest)  
-**$GENOMES** - Directory containing all available genomes (multiple
-sources, builds possible  
-**$INDICES** - Directory containing indices for bowtie, bowtie2, bwa for
-all available genomes  
-**$UNIPROT** - Directory containing latest release of full UniProt
-database
-
-  
-In order to check what genomes are available, you can just type:
-
-**Check available GENOMES**
-
-``` syntaxhighlighter-pre
-ls $GENOMES
-```
-
-  
+**$GENOMES** - Directory containing all available genomes (multiple sources, builds possible  
+**$INDICES** - Directory containing indices for bowtie, bowtie2, bwa for all available genomes  
+**$UNIPROT** - Directory containing latest release of full UniProt database
+\\
+\\
+\\
+In order to check what genomes are available, you can type:
+{{< highlight bash >}}
+$ ls $GENOMES
+{{< /highlight >}}
+\\
 In order to check what BLAST databases are available, you can just type:
-
-**Check available BLAST databases**
-
-``` syntaxhighlighter-pre
-ls $BLAST
-```
-
-  
-An example of how to run Bowtie2 local alignment on Tusker utilizing the
-default Horse, *Equus caballus* index (*BOWTIE2\_HORSE*) with paired-end
-fasta files and 8 CPUs is shown below:
-
-**bowtie2\_alignment.submit**
-
-\#!/bin/sh  
-\#SBATCH --job-name=Bowtie2  
-\#SBATCH --nodes=1  
-\#SBATCH --ntasks-per-node=8  
-\#SBATCH --time=168:00:00  
-\#SBATCH --mem=50gb  
-\#SBATCH --output=Bowtie2.%J.out  
-\#SBATCH --error=Bowtie2.%J.err
-
- 
-
-module load biodata/1.0
+{{< highlight bash >}}
+$ ls $BLAST
+{{< /highlight >}}
+\\
+An example of how to run Bowtie2 local alignment on Crane utilizing the default Horse, *Equus caballus* index (*BOWTIE2\_HORSE*) with paired-end fasta files and 8 CPUs is shown below:
+{{% panel header="`bowtie2_alignment.submit`"%}}
+{{< highlight bash >}}
+#!/bin/sh
+#SBATCH --job-name=Bowtie2
+#SBATCH --nodes=1
+#SBATCH --ntasks-per-node=8
+#SBATCH --time=168:00:00
+#SBATCH --mem=10gb
+#SBATCH --output=Bowtie2.%J.out
+#SBATCH --error=Bowtie2.%J.err

 module load bowtie/2.2
-
-bowtie2 -x $BOWTIE2\_HORSE -f -1 input\_reads\_pair\_1.fasta -2
-input\_reads\_pair\_2.fasta -S bowtie2\_alignments.sam --local -p
-$SLURM\_NTASKS\_PER\_NODE 
-
+module load biodata
+bowtie2 -x $BOWTIE2_HORSE -f -1 input_reads_pair_1.fasta -2 input_reads_pair_2.fasta -S bowtie2_alignments.sam --local -p $SLURM_NTASKS_PER_NODE
+
+{{< /highlight >}}
+{{% /panel %}}
+\\
+An example of BLAST run against the non-redundant nucleotide database available on Crane is provided below:
+{{% panel header="`blastn_alignment.submit`"%}}
+{{< highlight bash >}}
+#!/bin/sh
+#SBATCH --job-name=BlastN
+#SBATCH --nodes=1
+#SBATCH --ntasks-per-node=8
+#SBATCH --time=168:00:00
+#SBATCH --mem=10gb
+#SBATCH --output=BlastN.%J.out
+#SBATCH --error=BlastN.%J.err
+
+module load blast/2.7
+module load biodata
+cp $BLAST/nt.* /scratch
+cp input_reads.fasta /scratch
+
+blastn -db /scratch/nt -query /scratch/input_reads.fasta -out /scratch/blast_nucleotide.results
+cp /scratch/blast_nucleotide.results .
+
+{{< /highlight >}}
+{{% /panel %}}
  
-An example of BLAST run against the yeast nucleotide database available
-on Tusker is provided below:
-
-**blastn\_alignment.submit**
-
-\#!/bin/sh  
-\#SBATCH --job-name=BlastN  
-\#SBATCH --nodes=1  
-\#SBATCH --ntasks-per-node=8  
-\#SBATCH --time=168:00:00  
-\#SBATCH --mem=50gb  
-\#SBATCH --output=BlastN.%J.out  
-\#SBATCH --error=BlastN.%J.err
-
- 
-
-module load biodata/1.0
-
-module load blast/2.2
-
-cp $BLAST/yeast.nt.\* /tmp  
-cp yeast.query /tmp
-
-blastn -db /tmp/yeast.nt -query /tmp/yeast.query -out
-/tmp/blast\_nucleotide.results
-
-cp /tmp/blast\_nucleotide.results .
-
-  
-The organisms and their appropriate environmental variables for all
-genomes and chromosome files, as well as for short read aligned indices
-are shown on the link below:  
-
-Attachments:
------------
-
-<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
-[cb\_blastn\_biodata.xsl](attachments/15171887/15171888.xsl)
-(application/octet-stream)  
-<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
-[cb\_bowtie2\_biodata.xsl](attachments/15171887/15171889.xsl)
-(application/octet-stream)  
-<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
-[crane\_biodata\_version.xsl](attachments/15171887/15171890.xsl)
-(application/octet-stream)  
-<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
-[crane\_modules.xml](attachments/15171887/15171891.xml)
-(application/octet-stream)  
-<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
-[tusker\_biodata\_version.xsl](attachments/15171887/15171892.xsl)
-(application/octet-stream)  
-<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
-[tusker\_modules.xml](attachments/15171887/15171893.xml)
-(application/octet-stream)  
-
+The organisms and their appropriate environmental variables for all genomes and chromosome files, as well as for short read aligned indices are shown on the link below:  
+[Organisms](#organisms)