biodata_module.md 3.61 KB
Newer Older
1
2
3
+++
title = "Biodata Module"
description = "How to use Biodata Module on HCC machines"
4
5
scripts = ["https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/jquery.tablesorter.min.js", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-pager.min.js","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-filter.min.js","/js/sort-table.js"]
css = ["http://mottie.github.io/tablesorter/css/theme.default.css","https://mottie.github.io/tablesorter/css/theme.dropbox.css", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/jquery.tablesorter.pager.min.css","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/filter.formatter.min.css"]
6
7
weight = "52"
+++
8

npavlovikj's avatar
i    
npavlovikj committed
9

10
HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan), genome files, short read aligned indices etc. on Crane.  
11
In order to use these resources, the "**biodata**" module needs to be loaded first.  
12
For how to load module, please check [Module Commands]({{< relref "module_commands" >}}).
13

14
Loading the "**biodata**" module will pre-set many environment variables, but most likely you will only need a subset of them. Environment variables can be used in your command or script by prefixing `$` to the name.
15
16
17

The major environment variables are:  
**$DATA** - main directory  
18
**$BLAST** - Directory containing all available BLAST (nucleotide and protein) databases  
19
20
21
**$KEGG** - KEGG database main entry point (requires license)  
**$PANTHER** - PANTHER database main entry point (latest)  
**$IPR** - InterProScan database main entry point (latest)  
22
23
24
**$GENOMES** - Directory containing all available genomes (multiple sources, builds possible  
**$INDICES** - Directory containing indices for bowtie, bowtie2, bwa for all available genomes  
**$UNIPROT** - Directory containing latest release of full UniProt database
npavlovikj's avatar
i    
npavlovikj committed
25
26


27
28
29
30
In order to check what genomes are available, you can type:
{{< highlight bash >}}
$ ls $GENOMES
{{< /highlight >}}
npavlovikj's avatar
i    
npavlovikj committed
31
32


33
In order to check what BLAST databases are available, you can just type:
34
35
36
{{< highlight bash >}}
$ ls $BLAST
{{< /highlight >}}
npavlovikj's avatar
i    
npavlovikj committed
37
38


39
40
41
42
43
44
45
46
47
48
49
An example of how to run Bowtie2 local alignment on Crane utilizing the default Horse, *Equus caballus* index (*BOWTIE2\_HORSE*) with paired-end fasta files and 8 CPUs is shown below:
{{% panel header="`bowtie2_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Bowtie2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Bowtie2.%J.out
#SBATCH --error=Bowtie2.%J.err
50
51

module load bowtie/2.2
52
module load biodata
npavlovikj's avatar
npavlovikj committed
53

54
55
56
57
bowtie2 -x $BOWTIE2_HORSE -f -1 input_reads_pair_1.fasta -2 input_reads_pair_2.fasta -S bowtie2_alignments.sam --local -p $SLURM_NTASKS_PER_NODE

{{< /highlight >}}
{{% /panel %}}
npavlovikj's avatar
i    
npavlovikj committed
58
59


60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
An example of BLAST run against the non-redundant nucleotide database available on Crane is provided below:
{{% panel header="`blastn_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=BlastN
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=BlastN.%J.out
#SBATCH --error=BlastN.%J.err

module load blast/2.7
module load biodata
cp $BLAST/nt.* /scratch
cp input_reads.fasta /scratch

blastn -db /scratch/nt -query /scratch/input_reads.fasta -out /scratch/blast_nucleotide.results
cp /scratch/blast_nucleotide.results .

{{< /highlight >}}
{{% /panel %}}
npavlovikj's avatar
i    
npavlovikj committed
82
83


84
85
86
87
88
### Available Organisms

The organisms and their appropriate environmental variables for all genomes and chromosome files, as well as indices are shown in the table below.

{{< table url="http://rhino-head.unl.edu:8192/bio/data/json" >}}