_index.md 2.94 KB
Newer Older
1
2
3
4
5
+++
title = "Biodata Module"
description = "How to use Biodata Module on HCC machines"
weight = "52"
+++
6

7
8
9
HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan), genome files, short read aligned indices etc. on both Tusker and Crane.  
In order to use these resources, the "**biodata**" module needs to be loaded first.  
For how to load module, please check [Module Commands](#module_commands).
10

11
Loading the "**biodata**" module will pre-set many environment variables, but most likely you will only need a subset of them. Environment variables can be used in your command or script by prefixing `$` to the name.
12
13
14

The major environment variables are:  
**$DATA** - main directory  
15
**$BLAST** - Directory containing all available BLAST (nucleotide and protein) databases  
16
17
18
**$KEGG** - KEGG database main entry point (requires license)  
**$PANTHER** - PANTHER database main entry point (latest)  
**$IPR** - InterProScan database main entry point (latest)  
19
20
21
22
23
24
25
26
27
28
29
**$GENOMES** - Directory containing all available genomes (multiple sources, builds possible  
**$INDICES** - Directory containing indices for bowtie, bowtie2, bwa for all available genomes  
**$UNIPROT** - Directory containing latest release of full UniProt database
\\
\\
\\
In order to check what genomes are available, you can type:
{{< highlight bash >}}
$ ls $GENOMES
{{< /highlight >}}
\\
30
In order to check what BLAST databases are available, you can just type:
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
{{< highlight bash >}}
$ ls $BLAST
{{< /highlight >}}
\\
An example of how to run Bowtie2 local alignment on Crane utilizing the default Horse, *Equus caballus* index (*BOWTIE2\_HORSE*) with paired-end fasta files and 8 CPUs is shown below:
{{% panel header="`bowtie2_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Bowtie2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Bowtie2.%J.out
#SBATCH --error=Bowtie2.%J.err
46
47

module load bowtie/2.2
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
module load biodata
bowtie2 -x $BOWTIE2_HORSE -f -1 input_reads_pair_1.fasta -2 input_reads_pair_2.fasta -S bowtie2_alignments.sam --local -p $SLURM_NTASKS_PER_NODE

{{< /highlight >}}
{{% /panel %}}
\\
An example of BLAST run against the non-redundant nucleotide database available on Crane is provided below:
{{% panel header="`blastn_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=BlastN
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=BlastN.%J.out
#SBATCH --error=BlastN.%J.err

module load blast/2.7
module load biodata
cp $BLAST/nt.* /scratch
cp input_reads.fasta /scratch

blastn -db /scratch/nt -query /scratch/input_reads.fasta -out /scratch/blast_nucleotide.results
cp /scratch/blast_nucleotide.results .

{{< /highlight >}}
{{% /panel %}}
76
  
77
78
The organisms and their appropriate environmental variables for all genomes and chromosome files, as well as for short read aligned indices are shown on the link below:  
[Organisms](#organisms)
79