+++
title = "Biodata Module"
description = "How to use Biodata Module on HCC machines"
scripts = ["https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/jquery.tablesorter.min.js", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-pager.min.js","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-filter.min.js","/js/sort-table.js"]
css = ["http://mottie.github.io/tablesorter/css/theme.default.css","https://mottie.github.io/tablesorter/css/theme.dropbox.css", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/jquery.tablesorter.pager.min.css","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/filter.formatter.min.css"]
weight = "52"
+++


HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan), genome files, short read aligned indices etc. on Crane and Rhino.  
In order to use these resources, the "**biodata**" module needs to be loaded first.  
For how to load module, please check [Module Commands]({{< relref "/applications/modules/_index.md" >}}).

Loading the "**biodata**" module will pre-set many environment variables, but most likely you will only need a subset of them. Environment variables can be used in your command or script by prefixing `$` to the name.

The major environment variables are:  
**$DATA** - main directory  
**$BLAST** - Directory containing all available BLAST (nucleotide and protein) databases  
**$KEGG** - KEGG database main entry point (requires license)  
**$PANTHER** - PANTHER database main entry point (latest)  
**$IPR** - InterProScan database main entry point (latest)  
**$GENOMES** - Directory containing all available genomes (multiple sources, builds possible  
**$INDICES** - Directory containing indices for bowtie, bowtie2, bwa for all available genomes  
**$UNIPROT** - Directory containing latest release of full UniProt database

{{% notice info %}}
**To access the older format of BLAST databases that work with BLAST+ 2.9 and lower, please use the variable BLAST_V4.**
**The variable BLAST points to the directory with the new version 5 of the nucleotide and protein databases required for BLAST+ 2.10 and higher.**
{{% /notice %}}

In order to check what genomes are available, you can type:
{{< highlight bash >}}
$ ls $GENOMES
{{< /highlight >}}


In order to check what BLAST databases are available, you can just type:
{{< highlight bash >}}
$ ls $BLAST
{{< /highlight >}}


An example of how to run Bowtie2 local alignment on Crane utilizing the default Horse, *Equus caballus* index (*BOWTIE2\_HORSE*) with paired-end fasta files and 8 CPUs is shown below:
{{% panel header="`bowtie2_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Bowtie2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Bowtie2.%J.out
#SBATCH --error=Bowtie2.%J.err

module load bowtie/2.2
module load biodata

bowtie2 -x $BOWTIE2_HORSE -f -1 input_reads_pair_1.fasta -2 input_reads_pair_2.fasta -S bowtie2_alignments.sam --local -p $SLURM_NTASKS_PER_NODE

{{< /highlight >}}
{{% /panel %}}


An example of BLAST run against the non-redundant nucleotide database available on Crane is provided below:
{{% panel header="`blastn_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=BlastN
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=BlastN.%J.out
#SBATCH --error=BlastN.%J.err

module load blast/2.10
module load biodata
cp $BLAST/nt.* /scratch
cp input_reads.fasta /scratch

blastn -db /scratch/nt -query /scratch/input_reads.fasta -out /scratch/blast_nucleotide.results
cp /scratch/blast_nucleotide.results .

{{< /highlight >}}
{{% /panel %}}


### Available Organisms

The organisms and their appropriate environmental variables for all genomes and chromosome files, as well as indices are shown in the table below.

{{< table url="http://rhino-head.unl.edu:8192/bio/data/json" >}}