Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
Loading items

Target

Select target project
  • dweitzel2/hcc-docs
  • OMCCLUNG2/hcc-docs
  • salmandjing/hcc-docs
  • hcc/hcc-docs
4 results
Select Git revision
Loading items
Show changes
Showing
with 0 additions and 1412 deletions
+++
title = "Allinea Profiling & Debugging Tools"
description = "How to use the Allinea suite of tools for profiling and debugging."
+++
HCC provides both the Allinea Forge suite and Performance Reports to
assist with debugging and profiling C/C++/Fortran code. These tools
support single-threaded, multi-threaded (pthreads/OpenMP), MPI, and CUDA
code. The Allinea Forge suite consists of two programs: DDT for
debugging and MAP for profiling. The Performance Reports software
provides a convenient way to profile HPC applications. It generates an
easy-to-read single-page HTML report.
For information on using each tool, see the following pages.
[Using Allinea Forge via Reverse Connect]({{< relref "using_allinea_forge_via_reverse_connect" >}})
[Allinea Performance Reports]({{< relref "allinea_performance_reports" >}})
+++
title = "Available Software for Crane"
description = "List of available software for crane.unl.edu."
scripts = ["https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/jquery.tablesorter.min.js", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-pager.min.js","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-filter.min.js","/js/sort-table.js"]
css = ["http://mottie.github.io/tablesorter/css/theme.default.css","https://mottie.github.io/tablesorter/css/theme.dropbox.css", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/jquery.tablesorter.pager.min.css","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/filter.formatter.min.css"]
+++
{{% notice tip %}}
HCC provides some software packages via the Singularity container
software. If you do not see a desired package in the module list below,
please check the [Using Singularity]({{< relref "using_singularity" >}})
page for the software list there.
{{% /notice %}}
{{% panel theme="warning" header="Module prerequisites" %}}
If a module lists one or more prerequisites, the prerequisite module(s)
must be loaded before or along with, that module.
For example, the `cdo/2.1` modules requires `compiler/pgi/13.` To load
the cdo module, doing either
`module load compiler/pgi/13`
`module load cdo/2.1`
or
`module load compiler/pgi/13 cdo/2.1` (Note the prerequisite module
**must** be first.)
is acceptable.
{{% /panel %}}
{{% panel theme="info" header="Multiple versions" %}}
Some packages list multiple compilers for prerequisites. This means that
the package has been built with each version of the compilers listed.
{{% /panel %}}
{{% panel theme="warning" header="Custom GPU Anaconda Environment" %}}
If you are using custom GPU Anaconda Environment, the only module you need to load is `anaconda`:
`module load anaconda`
{{% /panel %}}
{{< table url="http://crane-head.unl.edu:8192/lmod/spider/json" >}}
+++
title = "Available Software for Rhino"
description = "List of available software for rhino.unl.edu."
scripts = ["https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/jquery.tablesorter.min.js", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-pager.min.js","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-filter.min.js","/js/sort-table.js"]
css = ["http://mottie.github.io/tablesorter/css/theme.default.css","https://mottie.github.io/tablesorter/css/theme.dropbox.css", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/jquery.tablesorter.pager.min.css","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/filter.formatter.min.css"]
+++
{{% notice tip %}}
HCC provides some software packages via the Singularity container
software. If you do not see a desired package in the module list below,
please check the [Using Singularity]({{< relref "using_singularity" >}})
page for the software list there.
{{% /notice %}}
{{% panel theme="warning" header="Module prerequisites" %}}
If a module lists one or more prerequisites, the prerequisite module(s)
must be loaded before or along with, that module.
For example, the `cdo/2.1` modules requires `compiler/pgi/13.` To load
the cdo module, doing either
`module load compiler/pgi/13`
`module load cdo/2.1`
or
`module load compiler/pgi/13 cdo/2.1` (Note the prerequisite module
**must** be first.)
is acceptable.
{{% /panel %}}
{{% panel theme="info" header="Multiple versions" %}}
Some packages list multiple compilers for prerequisites. This means that
the package has been built with each version of the compilers listed.
{{% /panel %}}
{{% panel theme="warning" header="Custom GPU Anaconda Environment" %}}
If you are using custom GPU Anaconda Environment, the only module you need to load is `anaconda`:
`module load anaconda`
{{% /panel %}}
{{< table url="http://rhino-head.unl.edu:8192/lmod/spider/json" >}}
+++
title = "Alignment Tools"
description = "How to use various alignment tools on HCC machines"
weight = "52"
+++
{{% children %}}
\ No newline at end of file
+++
title = "BLAST"
description = "How to use BLAST on HCC machines"
weight = "52"
+++
[BLAST] (https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a local alignment tool that finds similarity between sequences. This tool compares nucleotide or protein sequences to sequence databases, and calculates significance of matches. Sometimes these input sequences are large and using the command-line BLAST is required.
The following pages, [Create Local BLAST Database]({{<relref "create_local_blast_database" >}}) and [Running BLAST Alignment]({{<relref "running_blast_alignment" >}}) describe how to run some of the most common BLAST executables as a single job using the SLURM scheduler on HCC.
### Useful Information
In order to test the BLAST (blast/2.2) performance on Crane, we aligned three nucleotide query datasets, `small.fasta`, `medium.fasta` and `large.fasta`, against the non-redundant nucleotide **nt.fasta** database from NCBI. Some statistics about the query datasets and the time and memory resources used for the alignment are shown on the table below:
{{< readfile file="/static/html/blast.html" >}}
+++
title = "Data Manipulation Tools"
description = "How to use data manipulation tools on HCC machines"
weight = "52"
+++
{{% children %}}
\ No newline at end of file
+++
title = "De Novo Assembly Tools"
description = "How to use de novo assembly tools on HCC machines"
weight = "52"
+++
{{% children %}}
+++
title = "Running Trinity in Multiple Steps"
description = "How to run Trinity in multiple steps on HCC resources"
weight = "10"
+++
## Running Trinity with Paired-End fastq data with 8 CPUs and 100GB of RAM
The first step of running Trinity is to run Trinity with the option **--no_run_chrysalis**:
{{% panel header="`trinity_step1.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Trinity_Step1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Trinity_Step1.%J.out
#SBATCH --error=Trinity_Step1.%J.err
module load trinity/2.6
Trinity --seqType fq --JM 100G --left input_reads_pair_1.fastq --right input_reads_pair_2.fastq --SS_lib_type FR --output trinity_out/ --CPU $SLURM_NTASKS_PER_NODE --no_run_chrysalis
{{< /highlight >}}
{{% /panel %}}
The second step of running Trinity is to run Trinity with the option **--no_run_quantifygraph**:
{{% panel header="`trinity_step2.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Trinity_Step2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Trinity_Step2.%J.out
#SBATCH --error=Trinity_Step2.%J.err
module load trinity/2.6
Trinity --seqType fq --JM 100G --left input_reads_pair_1.fastq --right input_reads_pair_2.fastq --SS_lib_type FR --output trinity_out/ --CPU $SLURM_NTASKS_PER_NODE --no_run_quantifygraph
{{< /highlight >}}
{{% /panel %}}
The third step of running Trinity is to run Trinity with the option **--no_run_butterfly**:
{{% panel header="`trinity_step3.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Trinity_Step3
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Trinity_Step3.%J.out
#SBATCH --error=Trinity_Step3.%J.err
module load trinity/2.6
Trinity --seqType fq --JM 100G --left input_reads_pair_1.fastq --right input_reads_pair_2.fastq --SS_lib_type FR --output trinity_out/ --CPU $SLURM_NTASKS_PER_NODE --no_run_butterfly
{{< /highlight >}}
{{% /panel %}}
The fourth step of running Trinity is to run Trinity without any additional option:
{{% panel header="`trinity_step4.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Trinity_Step4
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Trinity_Step4.%J.out
#SBATCH --error=Trinity_Step4.%J.err
module load trinity/2.6
Trinity --seqType fq --JM 100G --left input_reads_pair_1.fastq --right input_reads_pair_2.fastq --SS_lib_type FR --output trinity_out/ --CPU $SLURM_NTASKS_PER_NODE
{{< /highlight >}}
{{% /panel %}}
### Trinity Output
Trinity outputs number of files in its `trinity_out/` output directory after each executed step. The output file `Trinity.fasta` is the final Trinity output that contains the assembled transcripts.
{{% notice tip %}}
The Inchworm (step 1) and Chrysalis (step 2) steps can be memory intensive. A basic recommendation is to have **1GB of RAM per 1M ~76 base Illumina paired-end reads**.
{{% /notice %}}
+++
title = "Velvet"
description = "How to use Velvet on HCC machines"
weight = "52"
+++
[Velvet] (https://www.ebi.ac.uk/~zerbino/velvet/) is a general sequence assembler designed to produce assembly from short, as well as long reads. Running Velvet consists of a sequence of two commands **velveth** and **velvetg**. **velveth** produces a hash table of k-mers, while **velvetg** constructs the genome assembly. The k-mer length, also known as hash length corresponds to the length, in base pairs, of the words of the reads being hashed.
Velvet has lots of parameters that can be found in its [manual] (https://www.ebi.ac.uk/~zerbino/velvet/Manual.pdf). However, the k-mer value is crucial in obtaining optimal assemblies. Higher k-mer values increase the specificity, and lower k-mer values increase the sensitivity.
Velvet supports multiple file formats: `fasta`, `fastq`, `fasta.gz`, `fastq.gz`, `sam`, `bam`, `eland`, `gerald`. Velvet also supports different read categories for different sequencing technologies and libraries, e.g. `short`, `shortPaired`, `short2`, `shortPaired2`, `long`, `longPaired`.
Each step of Velvet (**velveth** and **velvetg**) may be run as its own job. The following pages describe how to run Velvet in this manner on HCC and provide example submit scripts:
{{% children %}}
### Useful Information
In order to test the Velvet (velvet/1.2) performance on Tusker, we used three paired-end input fastq files, `small_1.fastq` and `small_2.fastq`, `medium_1.fastq` and `medium_2.fastq`, and `large_1.fastq` and `large_2.fastq`. Some statistics about the input files and the time and memory resources used by Velvet on Tusker are shown in the table below:
{{< readfile file="/static/html/velvet.html" >}}
+++
title = "Running Velvet with Paired-End Data"
description = "How to run velvet with paired-end data on HCC resources"
weight = "10"
+++
## Running Velvet with Paired-End long fastq data with k-mer=43, 8 CPUs and 100GB of RAM
The first step of running Velvet is to run **velveth**:
{{% panel header="`velveth.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Velvet_Velveth
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Velveth.%J.out
#SBATCH --error=Velveth.%J.err
module load velvet/1.2
export OMP_NUM_THREADS=$SLURM_NTASKS_PER_NODE
velveth output_directory/ 43 -fastq -longPaired -separate input_reads_pair_1.fastq input_reads_pair_2.fastq
{{< /highlight >}}
{{% /panel %}}
After running **velveth**, the next step is to run **velvetg** on the `output_directory/` and files generated from **velveth**:
{{% panel header="`velvetg.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Velvet_Velvetg
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Velvetg.%J.out
#SBATCH --error=Velvetg.%J.err
module load velvet/1.2
export OMP_NUM_THREADS=$SLURM_NTASKS_PER_NODE
velvetg output_directory/ -min_contig_lgth 200
{{< /highlight >}}
{{% /panel %}}
Both **velveth** and **velvetg** are multi-threaded.
### Velvet Output
{{% panel header="`Output directory after velveth`"%}}
{{< highlight bash >}}
$ ls output_directory/
Log Roadmaps Sequences
{{< /highlight >}}
{{% /panel %}}
{{% panel header="`Output directory after velvetg`"%}}
{{< highlight bash >}}
$ ls output_directory/
contigs.fa Graph LastGraph Log PreGraph Roadmaps Sequences stats.txt
{{< /highlight >}}
{{% /panel %}}
The output fasta file `contigs.fa` is the final Velvet output that contains the assembled contigs. More information about the output files is provided in the Velvet manual.
+++
title = "Running Velvet with Single-End and Paired-End Data"
description = "How to run velvet with single-end and paired-end data on HCC resources"
weight = "10"
+++
## Running Velvet with Single-End and Paired-End short fasta data with k-mer=51, 8 CPUs and 100GB of RAM
The first step of running Velvet is to run **velveth**:
{{% panel header="`velveth.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Velvet_Velveth
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Velveth.%J.out
#SBATCH --error=Velveth.%J.err
module load velvet/1.2
export OMP_NUM_THREADS=$SLURM_NTASKS_PER_NODE
velveth output_directory/ 51 -fasta -short input_reads.fasta -fasta -shortPaired2 -separate input_reads_pair_1.fasta input_reads_pair_2.fasta
{{< /highlight >}}
{{% /panel %}}
After running **velveth**, the next step is to run **velvetg** on the `output_directory/` and files generated from **velveth**:
{{% panel header="`velvetg.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Velvet_Velvetg
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Velvetg.%J.out
#SBATCH --error=Velvetg.%J.err
module load velvet/1.2
export OMP_NUM_THREADS=$SLURM_NTASKS_PER_NODE
velvetg output_directory/ -min_contig_lgth 200
{{< /highlight >}}
{{% /panel %}}
Both **velveth** and **velvetg** are multi-threaded.
### Velvet Output
{{% panel header="`Output directory after velveth`"%}}
{{< highlight bash >}}
$ ls output_directory/
Log Roadmaps Sequences
{{< /highlight >}}
{{% /panel %}}
{{% panel header="`Output directory after velvetg`"%}}
{{< highlight bash >}}
$ ls output_directory/
contigs.fa Graph LastGraph Log PreGraph Roadmaps Sequences stats.txt
{{< /highlight >}}
{{% /panel %}}
The output fasta file `contigs.fa` is the final Velvet output that contains the assembled contigs. More information about the output files is provided in the Velvet manual.
+++
title = "Running Velvet with Single-End Data"
description = "How to run velvet with single-end data on HCC resources"
weight = "10"
+++
## Running Velvet with Single-End short fasta data with k-mer=31, 8 CPUs and 100GB of RAM
The first step of running Velvet is to run **velveth**:
{{% panel header="`velveth.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Velvet_Velveth
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Velveth.%J.out
#SBATCH --error=Velveth.%J.err
module load velvet/1.2
export OMP_NUM_THREADS=$SLURM_NTASKS_PER_NODE
velveth output_directory/ 31 -fasta -short input_reads.fasta
{{< /highlight >}}
{{% /panel %}}
After running **velveth**, the next step is to run **velvetg** on the `output_directory/` and files generated from **velveth**:
{{% panel header="`velvetg.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Velvet_Velvetg
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Velvetg.%J.out
#SBATCH --error=Velvetg.%J.err
module load velvet/1.2
export OMP_NUM_THREADS=$SLURM_NTASKS_PER_NODE
velvetg output_directory/ -min_contig_lgth 200
{{< /highlight >}}
{{% /panel %}}
Both **velveth** and **velvetg** are multi-threaded.
### Velvet Output
{{% panel header="`Output directory after velveth`"%}}
{{< highlight bash >}}
$ ls output_directory/
Log Roadmaps Sequences
{{< /highlight >}}
{{% /panel %}}
{{% panel header="`Output directory after velvetg`"%}}
{{< highlight bash >}}
$ ls output_directory/
contigs.fa Graph LastGraph Log PreGraph Roadmaps Sequences stats.txt
{{< /highlight >}}
{{% /panel %}}
The output fasta file `contigs.fa` is the final Velvet output that contains the assembled contigs. More information about the output files is provided in the Velvet manual.
+++
title = "Downloading SRA data from NCBI"
description = "How to download data from NCBI"
weight = "52"
+++
One way to download high-volume data from NCBI is to use command line
utilities, such as **wget**, **ftp** or Aspera Connect **ascp**
plugin. The Aspera Connect plugin is commonly used high-performance transfer
plugin that provides the best transfer speed.
This plugin is available on our clusters as a module. In order to use it, load the appropriate module first:
{{< highlight bash >}}
$ module load aspera-cli
{{< /highlight >}}
The basic usage of the Aspera plugin is
{{< highlight bash >}}
$ ascp -i $ASPERA_PUBLIC_KEY -k 1 -T -l <max_download_rate_in_Mbps>m anonftp@ftp.ncbi.nlm.nih.gov:/<files_to_transfer> <local_work_output_directory>
{{< /highlight >}}
where **-k 1** enables resume of partial transfers, **-T** disables encryption for maximum throughput, and **-l** sets the transfer rate.
**\<files_to_transfer\>** mentioned in the basic usage of Aspera
plugin has a specifically defined pattern that needs to be followed:
{{< highlight bash >}}
<files_to_transfer> = /sra/sra-instant/reads/ByRun/sra/SRR|ERR|DRR/<first_6_characters_of_accession>/<accession>/<accession>.sra
{{< /highlight >}}
where **SRR\|ERR\|DRR** should be either **SRR**, **ERR **or **DRR** and should match the prefix of the target **.sra** file.
More **ascp** options can be seen by using:
{{< highlight bash >}}
$ ascp --help
{{< /highlight >}}
For example, if you want to download the **SRR304976** file from NCBI in your $WORK **data/** directory with downloading speed of **1000 Mbps**, you should use the following command:
{{< highlight bash >}}
$ ascp -i $ASPERA_PUBLIC_KEY -k 1 -T -l 1000m anonftp@ftp.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByRun/sra/SRR/SRR304/SRR304976/SRR304976.sra /work/[groupname]/[username]/data/
{{< /highlight >}}
+++
title = "Pre-processing Tools"
description = "How to use pre-processing tools on HCC machines"
weight = "52"
+++
{{% children %}}
\ No newline at end of file
+++
title = "Reference-Based Assembly Tools"
description = "How to use reference based assembly tools on HCC machines"
weight = "52"
+++
{{% children %}}
\ No newline at end of file
+++
title = "Tools for Removing/Detecting Redundant Sequences"
description = "How to use tools for removing/detecting redundant sequences on HCC machines"
weight = "52"
+++
{{% children %}}
+++
title = "Module Commands"
description = "How to use the module utility on HCC resources."
+++
`module` commands provide an HPC system user the capability to compile
into their source code using any type of library that is
available on the server. The `module` command gives each user the
ability to modify their environmental `PATH` and `LD_LIBRARY_PATH`
variables.
{{% notice info %}}
Please note that if you compile your application using a particular
module, you must include the appropriate module load statement in your
submit script.
{{% /notice %}}
### List Modules Loaded
{{% panel theme="info" header="Example Usage: module list" %}}
{{< highlight bash >}}
module list
No Modulefiles Currently Loaded.
echo $PATH
/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
{{< /highlight >}}
{{% /panel %}}
### List Modules Available
{{% panel theme="info" header="Example Usage: Listing Available Modules" %}}
{{< highlight bash >}}
module avail
---------------------------------------------- /util/opt/Modules/modulefiles ----------------------------------------------
NCL/6.0 bowtie/2.0.0-beta6 compiler/pgi/12 hdfeos5/1.14 mplus/7.0 szip/2.1
NCL/6.0dist compiler/gcc/4.6 cufflinks/2.0.2 hugeseq/1.0 netcdf/4.1 tophat/2.0.5
NCO/4.1 compiler/gcc/4.7 deprecated intel-mkl/11 netcdf/4.2 udunits/2.1
R/2.15 compiler/intel/11 hdf4/4.2 intel-mkl/12 openmpi/1.5 zlib/1.2
WRF/WRF compiler/intel/12 hdf5/1.8 lsdyna/5.1.1 openmpi/1.6
acml/5.1 compiler/open64/4.5 hdf5/1.8.6 lsdyna/6.0.0 samtools/0.1
bowtie/0.12.8 compiler/pgi/11 hdfeos2/2.18 mplus/6.12 sas/9.3
{{< /highlight >}}
{{% /panel %}}
#### module load \<module-name\>
Places the binaries and libraries for \<module-name\> into your `PATH` and `LD_LIBRARY_PATH`.
{{% panel theme="info" header="Example Usage: Loading Desired Module" %}}
{{< highlight bash >}}
module load compiler/pgi/11
module list
Currently Loaded Modulefiles:
1) compiler/pgi/11
echo $PATH
/util/comp/pgi/linux86-64/11/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
{{< /highlight >}}
{{% /panel %}}
#### module unload \<module-name\>
Removes the binaries and libraries associated with \<module-name\> from your PATH and `LD_LIBRARY_PATH`.
{{% panel theme="info" header="Example Usage: module unload" %}}
{{< highlight bash >}}
module unload compiler/pgi/11
module list
No Modulefiles Currently Loaded.
echo $PATH
/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
{{< /highlight >}}
{{% /panel %}}
#### module purge
**Purges** all previously **loaded** module libraries and binaries from
your `PATH` and `LD_LIBRARY_PATH`.
{{% panel theme="info" header="Example Usage: module purge" %}}
{{< highlight bash >}}
module load compiler/open64
module load zlib/1.2
module list
Currently Loaded Modulefiles:
1) zlib/1.2 2) compiler/open64/4.5
module purge
module list
No Modulefiles Currently Loaded.
{{< /highlight >}}
{{% /panel %}}
#### module help
To see a complete list of module commands/options.
**Example Usage: module help**
{{% panel theme="info" header="Example Usage: module help" %}}
{{< highlight bash >}}
Usage: module [options] sub-command [args ...]
Options:
-h -? -H --help This help message
-s availStyle --style=availStyle Site controlled avail style: system en_grouped (default: en_grouped)
--regression_testing Lmod regression testing
-D Program tracing written to stderr
--debug=dbglvl Program tracing written to stderr
--pin_versions=pinVersions When doing a restore use specified version, do not follow defaults
-d --default List default modules only when used with avail
-q --quiet Do not print out warnings
--expert Expert mode
-t --terse Write out in machine readable format for commands: list, avail, spider, savelist
--initial_load loading Lmod for first time in a user shell
--latest Load latest (ignore default)
--ignore_cache Treat the cache file(s) as out-of-date
--novice Turn off expert and quiet flag
--raw Print modulefile in raw output when used with show
-w twidth --width=twidth Use this as max term width
-v --version Print version info and quit
-r --regexp use regular expression match
--gitversion Dump git version in a machine readable way and quit
--dumpversion Dump version in a machine readable way and quit
--check_syntax --checkSyntax Checking module command syntax: do not load
--config Report Lmod Configuration
--config_json Report Lmod Configuration in json format
--mt Report Module Table State
--timer report run times
--force force removal of a sticky module or save an empty collection
--redirect Send the output of list, avail, spider to stdout (not stderr)
--no_redirect Force output of list, avail and spider to stderr
--show_hidden Avail and spider will report hidden modules
--spider_timeout=timeout a timeout for spider
-T --trace
module [options] sub-command [args ...]
Help sub-commands:
------------------
help prints this message
help module [...] print help message from module(s)
Loading/Unloading sub-commands:
-------------------------------
load | add module [...] load module(s)
try-load | try-add module [...] Add module(s), do not complain if not found
del | unload module [...] Remove module(s), do not complain if not found
swap | sw | switch m1 m2 unload m1 and load m2
purge unload all modules
refresh reload aliases from current list of modules.
update reload all currently loaded modules.
Listing / Searching sub-commands:
---------------------------------
list List loaded modules
list s1 s2 ... List loaded modules that match the pattern
avail | av List available modules
avail | av string List available modules that contain "string".
spider List all possible modules
spider module List all possible version of that module file
spider string List all module that contain the "string".
spider name/version Detailed information about that version of the module.
whatis module Print whatis information about module
keyword | key string Search all name and whatis that contain "string".
Searching with Lmod:
--------------------
All searching (spider, list, avail, keyword) support regular expressions:
spider -r '^p' Finds all the modules that start with `p' or `P'
spider -r mpi Finds all modules that have "mpi" in their name.
spider -r 'mpi$ Finds all modules that end with "mpi" in their name.
Handling a collection of modules:
--------------------------------
save | s Save the current list of modules to a user defined "default" collection.
save | s name Save the current list of modules to "name" collection.
reset The same as "restore system"
restore | r Restore modules from the user's "default" or system default.
restore | r name Restore modules from "name" collection.
restore system Restore module state to system defaults.
savelist List of saved collections.
describe | mcc name Describe the contents of a module collection.
Deprecated commands:
--------------------
getdefault [name] load name collection of modules or user's "default" if no name given.
===> Use "restore" instead <====
setdefault [name] Save current list of modules to name if given, otherwise save as the default list for you the user.
===> Use "save" instead. <====
Miscellaneous sub-commands:
---------------------------
show modulefile show the commands in the module file.
use [-a] path Prepend or Append path to MODULEPATH.
unuse path remove path from MODULEPATH.
tablelist output list of active modules as a lua table.
Important Environment Variables:
--------------------------------
LMOD_COLORIZE If defined to be "YES" then Lmod prints properties and warning in color.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Lmod Web Sites
Documentation: http://lmod.readthedocs.org
Github: https://github.com/TACC/Lmod
Sourceforge: https://lmod.sf.net
TACC Homepage: https://www.tacc.utexas.edu/research-development/tacc-projects/lmod
To report a bug please read http://lmod.readthedocs.io/en/latest/075_bug_reporting.html
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Modules based on Lua: Version 7.4.16 2017-05-23 11:10 -05:00
by Robert McLay mclay@tacc.utexas.edu
{{< /highlight >}}
{{% /panel %}}
+++
title = "MPI Jobs on HCC"
description = "How to compile and run MPI programs on HCC machines"
weight = "52"
+++
This quick start demonstrates how to implement a parallel (MPI)
Fortran/C program on HCC supercomputers. The sample codes and submit
scripts can be downloaded from [mpi_dir.zip](/attachments/mpi_dir.zip).
#### Login to a HCC Cluster
Log in to a HCC cluster through PuTTY ([For Windows Users]({{< relref "/quickstarts/connecting/for_windows_users">}})) or Terminal ([For Mac/Linux
Users]({{< relref "/quickstarts/connecting/for_maclinux_users">}})) and make a subdirectory called `mpi_dir` under the `$WORK` directory.
{{< highlight bash >}}
$ cd $WORK
$ mkdir mpi_dir
{{< /highlight >}}
In the subdirectory `mpi_dir`, save all the relevant codes. Here we
include two demo programs, `demo_f_mpi.f90` and `demo_c_mpi.c`, that
compute the sum from 1 to 20 through parallel processes. A
straightforward parallelization scheme is used for demonstration
purpose. First, the master core (i.e. `myid=0`) distributes equal
computation workload to a certain number of cores (as specified by
`--ntasks `in the submit script). Then, each worker core computes a
partial summation as output. Finally, the master core collects the
outputs from all worker cores and perform an overall summation. For easy
comparison with the serial code ([Fortran/C on HCC]({{< relref "fortran_c_on_hcc">}})), the
added lines in the parallel code (MPI) are marked with "!=" or "//=".
{{%expand "demo_f_mpi.f90" %}}
{{< highlight fortran >}}
Program demo_f_mpi
!====== MPI =====
use mpi
!================
implicit none
integer, parameter :: N = 20
real*8 w
integer i
common/sol/ x
real*8 x
real*8, dimension(N) :: y
!============================== MPI =================================
integer ind
real*8, dimension(:), allocatable :: y_local
integer numnodes,myid,rc,ierr,start_local,end_local,N_local
real*8 allsum
!====================================================================
!============================== MPI =================================
call mpi_init( ierr )
call mpi_comm_rank ( mpi_comm_world, myid, ierr )
call mpi_comm_size ( mpi_comm_world, numnodes, ierr )
!
N_local = N/numnodes
allocate ( y_local(N_local) )
start_local = N_local*myid + 1
end_local = N_local*myid + N_local
!====================================================================
do i = start_local, end_local
w = i*1d0
call proc(w)
ind = i - N_local*myid
y_local(ind) = x
! y(i) = x
! write(6,*) 'i, y(i)', i, y(i)
enddo
! write(6,*) 'sum(y) =',sum(y)
!============================================== MPI =====================================================
call mpi_reduce( sum(y_local), allsum, 1, mpi_real8, mpi_sum, 0, mpi_comm_world, ierr )
call mpi_gather ( y_local, N_local, mpi_real8, y, N_local, mpi_real8, 0, mpi_comm_world, ierr )
if (myid == 0) then
write(6,*) '-----------------------------------------'
write(6,*) '*Final output from... myid=', myid
write(6,*) 'numnodes =', numnodes
write(6,*) 'mpi_sum =', allsum
write(6,*) 'y=...'
do i = 1, N
write(6,*) y(i)
enddo
write(6,*) 'sum(y)=', sum(y)
endif
deallocate( y_local )
call mpi_finalize(rc)
!========================================================================================================
Stop
End Program
Subroutine proc(w)
real*8, intent(in) :: w
common/sol/ x
real*8 x
x = w
Return
End Subroutine
{{< /highlight >}}
{{% /expand %}}
{{%expand "demo_c_mpi.c" %}}
{{< highlight c >}}
//demo_c_mpi
#include <stdio.h>
//======= MPI ========
#include "mpi.h"
#include <stdlib.h>
//====================
double proc(double w){
double x;
x = w;
return x;
}
int main(int argc, char* argv[]){
int N=20;
double w;
int i;
double x;
double y[N];
double sum;
//=============================== MPI ============================
int ind;
double *y_local;
int numnodes,myid,rc,ierr,start_local,end_local,N_local;
double allsum;
//================================================================
//=============================== MPI ============================
MPI_Init(&argc, &argv);
MPI_Comm_rank( MPI_COMM_WORLD, &myid );
MPI_Comm_size ( MPI_COMM_WORLD, &numnodes );
N_local = N/numnodes;
y_local=(double *) malloc(N_local*sizeof(double));
start_local = N_local*myid + 1;
end_local = N_local*myid + N_local;
//================================================================
for (i = start_local; i <= end_local; i++){
w = i*1e0;
x = proc(w);
ind = i - N_local*myid;
y_local[ind-1] = x;
// y[i-1] = x;
// printf("i,x= %d %lf\n", i, y[i-1]) ;
}
sum = 0e0;
for (i = 1; i<= N_local; i++){
sum = sum + y_local[i-1];
}
// printf("sum(y)= %lf\n", sum);
//====================================== MPI ===========================================
MPI_Reduce( &sum, &allsum, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD );
MPI_Gather( &y_local[0], N_local, MPI_DOUBLE, &y[0], N_local, MPI_DOUBLE, 0, MPI_COMM_WORLD );
if (myid == 0){
printf("-----------------------------------\n");
printf("*Final output from... myid= %d\n", myid);
printf("numnodes = %d\n", numnodes);
printf("mpi_sum = %lf\n", allsum);
printf("y=...\n");
for (i = 1; i <= N; i++){
printf("%lf\n", y[i-1]);
}
sum = 0e0;
for (i = 1; i<= N; i++){
sum = sum + y[i-1];
}
printf("sum(y) = %lf\n", sum);
}
free( y_local );
MPI_Finalize ();
//======================================================================================
return 0;
}
{{< /highlight >}}
{{% /expand %}}
---
#### Compiling the Code
The compiling of a MPI code requires first loading a compiler "engine"
such as `gcc`, `intel`, or `pgi` and then loading a MPI wrapper
`openmpi`. Here we will use the GNU Complier Collection, `gcc`, for
demonstration.
{{< highlight bash >}}
$ module load compiler/gcc/6.1 openmpi/2.1
$ mpif90 demo_f_mpi.f90 -o demo_f_mpi.x
$ mpicc demo_c_mpi.c -o demo_c_mpi.x
{{< /highlight >}}
The above commends load the `gcc` complier with the `openmpi` wrapper.
The compiling commands `mpif90` or `mpicc` are used to compile the codes
to`.x` files (executables).
### Creating a Submit Script
Create a submit script to request 5 cores (with `--ntasks`). A parallel
execution command `mpirun ./` needs to enter to last line before the
main program name.
{{% panel header="`submit_f.mpi`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --ntasks=5
#SBATCH --mem-per-cpu=1024
#SBATCH --time=00:01:00
#SBATCH --job-name=Fortran
#SBATCH --error=Fortran.%J.err
#SBATCH --output=Fortran.%J.out
mpirun ./demo_f_mpi.x
{{< /highlight >}}
{{% /panel %}}
{{% panel header="`submit_c.mpi`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --ntasks=5
#SBATCH --mem-per-cpu=1024
#SBATCH --time=00:01:00
#SBATCH --job-name=C
#SBATCH --error=C.%J.err
#SBATCH --output=C.%J.out
mpirun ./demo_c_mpi.x
{{< /highlight >}}
{{% /panel %}}
#### Submit the Job
The job can be submitted through the command `sbatch`. The job status
can be monitored by entering `squeue` with the `-u` option.
{{< highlight bash >}}
$ sbatch submit_f.mpi
$ sbatch submit_c.mpi
$ squeue -u <username>
{{< /highlight >}}
Replace `<username>` with your HCC username.
Sample Output
-------------
The sum from 1 to 20 is computed and printed to the `.out` file (see
below). The outputs from the 5 cores are collected and processed by the
master core (i.e. `myid=0`).
{{%expand "Fortran.out" %}}
{{< highlight batchfile>}}
-----------------------------------------
*Final output from... myid= 0
numnodes = 5
mpi_sum = 210.00000000000000
y=...
1.0000000000000000
2.0000000000000000
3.0000000000000000
4.0000000000000000
5.0000000000000000
6.0000000000000000
7.0000000000000000
8.0000000000000000
9.0000000000000000
10.000000000000000
11.000000000000000
12.000000000000000
13.000000000000000
14.000000000000000
15.000000000000000
16.000000000000000
17.000000000000000
18.000000000000000
19.000000000000000
20.000000000000000
sum(y)= 210.00000000000000
{{< /highlight >}}
{{% /expand %}}
{{%expand "C.out" %}}
{{< highlight batchfile>}}
-----------------------------------
*Final output from... myid= 0
numnodes = 5
mpi_sum = 210.000000
y=...
1.000000
2.000000
3.000000
4.000000
5.000000
6.000000
7.000000
8.000000
9.000000
10.000000
11.000000
12.000000
13.000000
14.000000
15.000000
16.000000
17.000000
18.000000
19.000000
20.000000
sum(y) = 210.000000
{{< /highlight >}}
{{% /expand %}}
+++
title = "Running OLAM at HCC"
description = "How to run the OLAM (Ocean Land Atmosphere Model) on HCC resources."
+++
### OLAM compilation on Tusker
##### pgi/11 compilation with mpi and openmp enabled
1. Load modules:
{{< highlight bash >}}
module load compiler/pgi/11 openmpi/1.6 szip/2.1 zlib/1.2 NCL/6.1dist
{{< /highlight >}}
2. Edit the `include.mk` file.
{{% panel theme="info" header="include.mk" %}}
{{< highlight batch >}}
#----------------- LINUX Intel Fortran ifort/gcc ---------------
F_COMP=mpif90
# If the compiler supports (and the user wants to use)
# the module IEEE_ARITHMETIC, uncomment below
IEEE_ARITHMETIC=yes
# If using MPI libraries:
OLAM_MPI=yes
# If parallel hdf5 is supported, uncomment the next line
OLAM_PARALLEL_HDF5=yes
# If you use the ED2 model, uncomment the next line
#USE_ED2=yes
MPI_PATH=/util/opt/openmpi/1.6/pgi/11
PAR_INCS=-I$(MPI_PATH)/include:$(MPI_PATH)/lib
PAR_LIBS=-L$(MPI_PATH)/lib -lmpi
# OPTIMIZED:
F_OPTS=-O3 -traceback -mp
#F_OPTS=-xHost -O3 -fno-alias -ip -openmp -traceback
#F_OPTS=-g -O3 -xHost -traceback
# DEBUG:
#F_OPTS=-g -fp-model precise -check bounds -traceback \
# -debug extended -check uninit -ftrapuv
# FORTRAN FLAGS FOR BIG FILES WHICH WOULD HAVE EXCESSIVE COMPILATION TIME
#SLOW_FFLAGS=-O1 -g -no-ip -traceback
C_COMP=mpicc
#C_COMP=mpicc
C_OPTS=-DUNDERSCORE -DLITTLE
NCARG_DIR=/util/src/ncl_ncarg/ncl_ncarg-6.1.2/lib
LIBNCARG=-L$(NCARG_DIR) -lncarg -lncarg_gks -lncarg_c \
-L/usr/lib64 -lX11 -ldl -lpthread -lgfortran -lcairo
HDF5_LIBS=-L/util/opt/hdf5/1.8.13/openmpi/1.6/pgi/11/lib -lhdf5_fortran -lhdf5 -lz -lm
HDF5_INCS=-I/util/opt/hdf5/1.8.13/openmpi/1.6/pgi/11/include
NETCDF_LIBS=-L/util/opt/netcdf/4.2/pgi/11/lib -lnetcdf
NETCDF_INCS=-I/util/opt/netcdf/4.2/pgi/11/include
LOADER=$(F_COMP)
LOADER_OPTS=-mp
#LOADER_OPTS=-static-intel $(F_OPTS)
# For Apple OSX: the stack size needs to be increased at link time
# LOADER_OPTS=-static-intel $(F_OPTS) -Wl,-stack_size -Wl,0x10000000
# to allow ifort compiler to link with pg-compiled ncar graphics:
# LIBS=-z muldefs -L/opt/pgi/linux86-64/5.2/lib -lpgftnrtl -lpgc
## IMPORTANT: Need to specify this flag in ED2
#USE_HDF5=1
{{< /highlight >}}
{{% /panel %}}
3. Command: `make clean`
4. Command: `make -j 8`
##### intel/12 compilation with mpi and openmp enabled
1. Load modules:
{{< highlight bash >}}
module load compiler/intel/12 openmpi/1.6 szip/2.1 zlib/1.2
{{< /highlight >}}
2. Edit the `include.mk` file.
{{% panel theme="info" header="include.mk" %}}
{{< highlight batch >}}
#----------------- LINUX Intel Fortran ifort/gcc ---------------
F_COMP=mpif90
# If the compiler supports (and the user wants to use)
# the module IEEE_ARITHMETIC, uncomment below
IEEE_ARITHMETIC=yes
# If using MPI libraries:
OLAM_MPI=yes
# If parallel hdf5 is supported, uncomment the next line
OLAM_PARALLEL_HDF5=yes
# If you use the ED2 model, uncomment the next line
#USE_ED2=yes
MPI_PATH=/util/opt/openmpi/1.6/intel/12
PAR_INCS=-I$(MPI_PATH)/include:$(MPI_PATH)/lib
PAR_LIBS=-L$(MPI_PATH)/lib -lmpi
# OPTIMIZED:
F_OPTS=-O3 -traceback -openmp
#F_OPTS=-xHost -O3 -fno-alias -ip -openmp -traceback
#F_OPTS=-g -O3 -xHost -traceback
# DEBUG:
#F_OPTS=-g -fp-model precise -check bounds -traceback \
# -debug extended -check uninit -ftrapuv
# FORTRAN FLAGS FOR BIG FILES WHICH WOULD HAVE EXCESSIVE COMPILATION TIME
#SLOW_FFLAGS=-O1 -g -no-ip -traceback
C_COMP=mpicc
#C_COMP=mpicc
C_OPTS=-DUNDERSCORE -DLITTLE
NCARG_DIR=/util/src/ncl_ncarg/ncl_ncarg-6.1.2/lib
LIBNCARG=-L$(NCARG_DIR) -lncarg -lncarg_gks -lncarg_c \
-L/usr/lib64 -lX11 -ldl -lpthread -lgfortran -lcairo
HDF5_LIBS=-L/util/opt/hdf5/1.8.13/openmpi/1.6/intel/12/lib -lhdf5_fortran -lhdf5 -lz -lm
HDF5_INCS=-I/util/opt/hdf5/1.8.13/openmpi/1.6/intel/12/include
NETCDF_LIBS=-L/util/opt/netcdf/4.2/intel/12/lib -lnetcdf
NETCDF_INCS=-I/util/opt/netcdf/4.2/intel/12/include
LOADER=$(F_COMP)
LOADER_OPTS=-openmp
#LOADER_OPTS=-static-intel $(F_OPTS)
# For Apple OSX: the stack size needs to be increased at link time
# LOADER_OPTS=-static-intel $(F_OPTS) -Wl,-stack_size -Wl,0x10000000
# to allow ifort compiler to link with pg-compiled ncar graphics:
# LIBS=-z muldefs -L/opt/pgi/linux86-64/5.2/lib -lpgftnrtl -lpgc
## IMPORTANT: Need to specify this flag in ED2
#USE_HDF5=1
{{< /highlight >}}
{{% /panel %}}
3. Command: `make clean`
4. Command: `make -j 8`
### OLAM compilation on Crane
##### Intel/15 compiler with OpenMPI/1.10
1. Load modules:
{{< highlight bash >}}
module load compiler/intel/15 openmpi/1.10 NCL/6.1 netcdf/4.4 phdf5/1.8 szip/2.1 zlib/1.2
{{< /highlight >}}
2. Edit the `include.mk` file:
{{% panel theme="info" header="include.mk" %}}
{{< highlight batch >}}
#----------------- LINUX Intel Fortran ifort/gcc ---------------
F_COMP=/util/opt/hdf5/1.8/openmpi/1.10/intel/15/bin/h5pfc
# If the compiler supports (and the user wants to use)
# the module IEEE_ARITHMETIC, uncomment below
IEEE_ARITHMETIC=yes
# If using MPI libraries:
OLAM_MPI=yes
# If parallel hdf5 is supported, uncomment the next line
OLAM_PARALLEL_HDF5=yes
# If you use the ED2 model, uncomment the next line
#USE_ED2=yes
#MPI_PATH=/usr/local/mpich
PAR_INCS=-I/util/opt/openmpi/1.10/intel/15/include
PAR_LIBS=-L/util/opt/openmpi/1.10/intel/15/lib
# OPTIMIZED:
F_OPTS=-xHost -O3 -fno-alias -ip -openmp -traceback
#F_OPTS=-g -O3 -xHost -traceback
# DEBUG:
#F_OPTS=-g -fp-model precise -check bounds -traceback \
# -debug extended -check uninit -ftrapuv
# EXTRA OPTIONS FOR FIXED-SOURCE CODE
FIXED_SRC_FLAGS=-fixed -132
# FORTRAN FLAGS FOR BIG FILES WHICH WOULD HAVE EXCESSIVE COMPILATION TIME
SLOW_FFLAGS=-O1 -g -no-ip -traceback
#C_COMP=icc
C_COMP=mpicc
C_OPTS=-O3 -DUNDERSCORE -DLITTLE
NCARG_DIR=/util/opt/NCL/6.1/lib
LIBNCARG=-L$(NCARG_DIR) -lncarg -lncarg_gks -lncarg_c \
-L/usr/lib64 -lX11 -ldl -lpng -lpthread -lgfortran -lcairo
HDF5_LIBS=-L/util/opt/hdf5/1.8/openmpi/1.10/intel/15/lib
HDF5_INCS=-I/util/opt/hdf5/1.8/openmpi/1.10/intel/15/include
NETCDF_LIBS=-L/util/opt/netcdf/4.4/intel/15/lib -lnetcdf
NETCDF_INCS=-I/util/opt/netcdf/4.4/intel/15/include
LOADER=$(F_COMP)
LOADER_OPTS=-static-intel $(F_OPTS)
# For Apple OSX: the stack size needs to be increased at link time
# LOADER_OPTS=-static-intel $(F_OPTS) -Wl,-stack_size -Wl,0x10000000
# to allow ifort compiler to link with pg-compiled ncar graphics:
# LIBS=-z muldefs -L/opt/pgi/linux86-64/5.2/lib -lpgftnrtl -lpgc
## IMPORTANT: Need to specify this flag in ED2
USE_HDF5=1
{{< /highlight >}}
{{% /panel %}}
3. Command: `make clean`
4. Command: `make -j 8`
### Sample SLURM submit scripts
##### PGI compiler:
{{% panel theme="info" header="Sample submit script for PGI compiler" %}}
{{< highlight batch >}}
#!/bin/sh
#SBATCH --ntasks=8 # 8 cores
#SBATCH --mem-per-cpu=1024 # Minimum memory required per CPU (in megabytes)
#SBATCH --time=03:15:00 # Run time in hh:mm:ss
#SBATCH --error=/work/[groupname]/[username]/job.%J.err
#SBATCH --output=/work/[groupname]/[username]/job.%J.out
module load compiler/pgi/11 openmpi/1.6 szip/2.1 zlib/1.2
mpirun /path/to/olam-4.2c-mpi
{{< /highlight >}}
{{% /panel %}}
##### Intel compiler:
{{% panel theme="info" header="Sample submit script for Intel compiler" %}}
{{< highlight batch >}}
#!/bin/sh
#SBATCH --ntasks=8 # 8 cores
#SBATCH --mem-per-cpu=1024 # Minimum memory required per CPU (in megabytes)
#SBATCH --time=03:15:00 # Run time in hh:mm:ss
#SBATCH --error=/work/[groupname]/[username]/job.%J.err
#SBATCH --output=/work/[groupname]/[username]/job.%J.out
module load compiler/intel/12 openmpi/1.6 szip/2.1 zlib/1.2
mpirun /path/to/olam-4.2c-mpi
{{< /highlight >}}
{{% /panel %}}
+++
title = "Running Theano"
description = "How to run the Theano on HCC resources."
+++
Theano is available on HCC resources via the modules system. Both CPU and GPU
versions are available on Crane. Additionally, installs for both Python
2.7 and 3.6 are provided.
### Initial Setup
Theano attempts to write to a `~/.theano` directory in some
circumstances, which can cause errors as the `/home` filesystem is
read-only on HCC machines. As a workaround, create the directory on
`/work` and make a symlink from `/home`:
{{% panel theme="info" header="Create & symlink .theano directory" %}}
{{< highlight bash >}}
mkdir -p $WORK/.theano
ln -s $WORK/.theano $HOME/.theano
{{< /highlight >}}
{{% /panel %}}
This only needs to be done once on each HCC machine.
### Running the CPU version
To use the CPU version, simply load the module and run your Python code.
You can choose between the Python 2.7, 3.5 or 3.6 environments:
{{% panel theme="info" header="Python 2.7 version" %}}
{{< highlight bash >}}
module load theano/py27/1.0
python my_python2_script.py
{{< /highlight >}}
{{% /panel %}}
or
{{% panel theme="info" header="Python 3.5 version" %}}
{{< highlight bash >}}
module load theano/py35/1.0
python my_python3_script.py
{{< /highlight >}}
{{% /panel %}}
or
{{% panel theme="info" header="Python 3.6 version" %}}
{{< highlight bash >}}
module load theano/py36/1.0
python my_python3_script.py
{{< /highlight >}}
{{% /panel %}}
### Running the GPU version
To use the GPU version, first create a `~/.theanorc` file with the
following contents (or append to an existing file as needed):
{{% panel theme="info" header="~/.theanorc" %}}
{{< highlight batch >}}
[global]
device = cuda
{{< /highlight >}}
{{% /panel %}}
Next, load the theano module:
{{% panel theme="info" header="Load the theano module" %}}
{{< highlight bash >}}
module load theano/py27/0.9
{{< /highlight >}}
{{% /panel %}}
To test the GPU support, start an interactive job on a GPU node and
import the theano module within the Python interpreter. You should see
output similar to the following:
{{% panel theme="info" header="GPU support test" %}}
{{< highlight python >}}
Python 2.7.15 | packaged by conda-forge | (default, May 8 2018, 14:46:53)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import theano
Using cuDNN version 7005 on context None
Mapped name None to device cuda: Tesla K20m (0000:03:00.0)
{{< /highlight >}}
{{% /panel %}}