_index.md 4.39 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
1.  [HCC-DOCS](index.html)
2.  [HCC-DOCS Home](HCC-DOCS-Home_327685.html)
3.  [HCC Documentation](HCC-Documentation_332651.html)
4.  [Running Applications](Running-Applications_7471153.html)
5.  [Bioinformatics Tools](Bioinformatics-Tools_8193279.html)

<span id="title-text"> HCC-DOCS : Biodata Module </span>
========================================================

Created by <span class="author"> Adam Caprez</span>, last modified on
Feb 22, 2017

| Name    | Version | Resource |
|---------|---------|----------|
| biodata | 1.0     | Tusker   |

| Name    | Version | Resource |
|---------|---------|----------|
| biodata | 1.0     | Crane    |

  
HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan),
genome files, short read aligned indices etc. on both Tusker and
Crane.  
In order to use these resources, the "**biodata**" module needs to be
loaded first.  
For how to load module, please check [Module
Commands](Module-Commands_332464.html)

Loading the "**biodata**" module will pre-set many environment
variables, but most likely you will only need a subset of them.  
Environment variables can be used in your command or script by prefixing
a **$** sign to the name.

The major environment variables are:  
**$DATA** - main directory  
**$BLAST** - Directory containing all available BLAST (nucleotide and
protein) databases  
**$KEGG** - KEGG database main entry point (requires license)  
**$PANTHER** - PANTHER database main entry point (latest)  
**$IPR** - InterProScan database main entry point (latest)  
**$GENOMES** - Directory containing all available genomes (multiple
sources, builds possible  
**$INDICES** - Directory containing indices for bowtie, bowtie2, bwa for
all available genomes  
**$UNIPROT** - Directory containing latest release of full UniProt
database

  
In order to check what genomes are available, you can just type:

**Check available GENOMES**

``` syntaxhighlighter-pre
ls $GENOMES
```

  
In order to check what BLAST databases are available, you can just type:

**Check available BLAST databases**

``` syntaxhighlighter-pre
ls $BLAST
```

  
An example of how to run Bowtie2 local alignment on Tusker utilizing the
default Horse, *Equus caballus* index (*BOWTIE2\_HORSE*) with paired-end
fasta files and 8 CPUs is shown below:

**bowtie2\_alignment.submit**

\#!/bin/sh  
\#SBATCH --job-name=Bowtie2  
\#SBATCH --nodes=1  
\#SBATCH --ntasks-per-node=8  
\#SBATCH --time=168:00:00  
\#SBATCH --mem=50gb  
\#SBATCH --output=Bowtie2.%J.out  
\#SBATCH --error=Bowtie2.%J.err

 

module load biodata/1.0

module load bowtie/2.2

bowtie2 -x $BOWTIE2\_HORSE -f -1 input\_reads\_pair\_1.fasta -2
input\_reads\_pair\_2.fasta -S bowtie2\_alignments.sam --local -p
$SLURM\_NTASKS\_PER\_NODE 

  
An example of BLAST run against the yeast nucleotide database available
on Tusker is provided below:

**blastn\_alignment.submit**

\#!/bin/sh  
\#SBATCH --job-name=BlastN  
\#SBATCH --nodes=1  
\#SBATCH --ntasks-per-node=8  
\#SBATCH --time=168:00:00  
\#SBATCH --mem=50gb  
\#SBATCH --output=BlastN.%J.out  
\#SBATCH --error=BlastN.%J.err

 

module load biodata/1.0

module load blast/2.2

cp $BLAST/yeast.nt.\* /tmp  
cp yeast.query /tmp

blastn -db /tmp/yeast.nt -query /tmp/yeast.query -out
/tmp/blast\_nucleotide.results

cp /tmp/blast\_nucleotide.results .

  
The organisms and their appropriate environmental variables for all
genomes and chromosome files, as well as for short read aligned indices
are shown on the link below:  

Attachments:
------------

<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[cb\_blastn\_biodata.xsl](attachments/15171887/15171888.xsl)
(application/octet-stream)  
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[cb\_bowtie2\_biodata.xsl](attachments/15171887/15171889.xsl)
(application/octet-stream)  
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[crane\_biodata\_version.xsl](attachments/15171887/15171890.xsl)
(application/octet-stream)  
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[crane\_modules.xml](attachments/15171887/15171891.xml)
(application/octet-stream)  
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[tusker\_biodata\_version.xsl](attachments/15171887/15171892.xsl)
(application/octet-stream)  
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[tusker\_modules.xml](attachments/15171887/15171893.xml)
(application/octet-stream)