+++ title = "Biodata Module" description = "How to use Biodata Module on HCC machines" scripts = ["https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/jquery.tablesorter.min.js", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-pager.min.js","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-filter.min.js","/js/sort-table.js"] css = ["http://mottie.github.io/tablesorter/css/theme.default.css","https://mottie.github.io/tablesorter/css/theme.dropbox.css", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/jquery.tablesorter.pager.min.css","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/filter.formatter.min.css"] weight = "52" +++ HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan), genome files, short read aligned indices etc. on Crane and Rhino. In order to use these resources, the "**biodata**" module needs to be loaded first. For how to load module, please check [Module Commands]({{< relref "/applications/modules/_index.md" >}}). Loading the "**biodata**" module will pre-set many environment variables, but most likely you will only need a subset of them. Environment variables can be used in your command or script by prefixing `$` to the name. The major environment variables are: **$DATA** - main directory **$BLAST** - Directory containing all available BLAST (nucleotide and protein) databases **$KEGG** - KEGG database main entry point (requires license) **$PANTHER** - PANTHER database main entry point (latest) **$IPR** - InterProScan database main entry point (latest) **$GENOMES** - Directory containing all available genomes (multiple sources, builds possible **$INDICES** - Directory containing indices for bowtie, bowtie2, bwa for all available genomes **$UNIPROT** - Directory containing latest release of full UniProt database {{% notice info %}} **To access the older format of BLAST databases that work with BLAST+ 2.9 and lower, please use the variable BLAST_V4.** **The variable BLAST points to the directory with the new version 5 of the nucleotide and protein databases required for BLAST+ 2.10 and higher.** {{% /notice %}} In order to check what genomes are available, you can type: {{< highlight bash >}} $ ls $GENOMES {{< /highlight >}} In order to check what BLAST databases are available, you can just type: {{< highlight bash >}} $ ls $BLAST {{< /highlight >}} An example of how to run Bowtie2 local alignment on Crane utilizing the default Horse, *Equus caballus* index (*BOWTIE2\_HORSE*) with paired-end fasta files and 8 CPUs is shown below: {{% panel header="`bowtie2_alignment.submit`"%}} {{< highlight bash >}} #!/bin/sh #SBATCH --job-name=Bowtie2 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=8 #SBATCH --time=168:00:00 #SBATCH --mem=10gb #SBATCH --output=Bowtie2.%J.out #SBATCH --error=Bowtie2.%J.err module load bowtie/2.2 module load biodata bowtie2 -x $BOWTIE2_HORSE -f -1 input_reads_pair_1.fasta -2 input_reads_pair_2.fasta -S bowtie2_alignments.sam --local -p $SLURM_NTASKS_PER_NODE {{< /highlight >}} {{% /panel %}} An example of BLAST run against the non-redundant nucleotide database available on Crane is provided below: {{% panel header="`blastn_alignment.submit`"%}} {{< highlight bash >}} #!/bin/sh #SBATCH --job-name=BlastN #SBATCH --nodes=1 #SBATCH --ntasks-per-node=8 #SBATCH --time=168:00:00 #SBATCH --mem=10gb #SBATCH --output=BlastN.%J.out #SBATCH --error=BlastN.%J.err module load blast/2.10 module load biodata cp $BLAST/nt.* /scratch cp input_reads.fasta /scratch blastn -db /scratch/nt -query /scratch/input_reads.fasta -out /scratch/blast_nucleotide.results cp /scratch/blast_nucleotide.results . {{< /highlight >}} {{% /panel %}} ### Available Organisms The organisms and their appropriate environmental variables for all genomes and chromosome files, as well as indices are shown in the table below. {{< table url="http://rhino-head.unl.edu:8192/bio/data/json" >}}