title = "Biodata Module"
description = "How to use Biodata Module on HCC machines"
scripts = ["https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/jquery.tablesorter.min.js", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-pager.min.js","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-filter.min.js","/js/sort-table.js"]
css = ["http://mottie.github.io/tablesorter/css/theme.default.css","https://mottie.github.io/tablesorter/css/theme.dropbox.css", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/jquery.tablesorter.pager.min.css","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/filter.formatter.min.css"]
weight = "52"
HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan), genome files, short read aligned indices etc. on Crane and Rhino.
In order to use these resources, the "biodata" module needs to be loaded first.
For how to load module, please check Module Commands.
Loading the "biodata" module will pre-set many environment variables, but most likely you will only need a subset of them. Environment variables can be used in your command or script by prefixing $
to the name.
The major environment variables are:
{{% notice info %}} To access the older format of BLAST databases that work with BLAST+ 2.9 and lower, please use the variable BLAST_V4. The variable BLAST points to the directory with the new version 5 of the nucleotide and protein databases required for BLAST+ 2.10 and higher. {{% /notice %}}
In order to check what genomes are available, you can type: {{< highlight bash >}} $ ls $GENOMES {{< /highlight >}}
In order to check what BLAST databases are available, you can just type: {{< highlight bash >}} $ ls $BLAST {{< /highlight >}}
An example of how to run Bowtie2 local alignment on Crane utilizing the default Horse, Equus caballus index (BOWTIE2_HORSE) with paired-end fasta files and 8 CPUs is shown below:
{{% panel header="bowtie2_alignment.submit
"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Bowtie2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Bowtie2.%J.out
#SBATCH --error=Bowtie2.%J.err
module load bowtie/2.2 module load biodata
bowtie2 -x $BOWTIE2_HORSE -f -1 input_reads_pair_1.fasta -2 input_reads_pair_2.fasta -S bowtie2_alignments.sam --local -p $SLURM_NTASKS_PER_NODE
{{< /highlight >}} {{% /panel %}}
An example of BLAST run against the non-redundant nucleotide database available on Crane is provided below:
{{% panel header="blastn_alignment.submit
"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=BlastN
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=BlastN.%J.out
#SBATCH --error=BlastN.%J.err
module load blast/2.10 module load biodata cp $BLAST/nt.* /scratch cp input_reads.fasta /scratch
blastn -db /scratch/nt -query /scratch/input_reads.fasta -out /scratch/blast_nucleotide.results cp /scratch/blast_nucleotide.results .
{{< /highlight >}} {{% /panel %}}
Available Organisms
The organisms and their appropriate environmental variables for all genomes and chromosome files, as well as indices are shown in the table below.
{{< table url="http://rhino-head.unl.edu:8192/bio/data/json" >}}