Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
  • FAQ
  • RDPv10
  • UNL_OneDrive
  • atticguidelines
  • data_share
  • globus-auto-backups
  • good-hcc-practice-rep-workflow
  • hchen2016-faq-home-is-full
  • ipynb-doc
  • master
  • rclone-fix
  • sislam2-master-patch-51693
  • sislam2-master-patch-86974
  • site_url
  • test
15 results

Target

Select target project
  • dweitzel2/hcc-docs
  • OMCCLUNG2/hcc-docs
  • salmandjing/hcc-docs
  • hcc/hcc-docs
4 results
Select Git revision
  • 26-add-screenshots-for-newer-rdp-v10-client
  • 28-overview-page-for-connecting-2
  • AddExamples
  • OMCCLUNG2-master-patch-74599
  • RDPv10
  • globus-auto-backups
  • gpu_update
  • master
  • mtanash2-master-patch-75717
  • mtanash2-master-patch-83333
  • mtanash2-master-patch-87890
  • mtanash2-master-patch-96320
  • patch-1
  • patch-2
  • patch-3
  • runTime
  • submitting-jobs-overview
  • tharvill1-master-patch-26973
18 results
Show changes
Showing
with 0 additions and 1688 deletions
+++
title = "Available Software for Crane"
description = "List of available software for crane.unl.edu."
scripts = ["https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/jquery.tablesorter.min.js", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-pager.min.js","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-filter.min.js","/js/sort-table.js"]
css = ["http://mottie.github.io/tablesorter/css/theme.default.css","https://mottie.github.io/tablesorter/css/theme.dropbox.css", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/jquery.tablesorter.pager.min.css","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/filter.formatter.min.css"]
+++
{{% notice tip %}}
HCC provides some software packages via the Singularity container
software. If you do not see a desired package in the module list below,
please check the [Using Singularity]({{< relref "using_singularity" >}})
page for the software list there.
{{% /notice %}}
{{% panel theme="warning" header="Module prerequisites" %}}
If a module lists one or more prerequisites, the prerequisite module(s)
must be loaded before or along with, that module.
For example, the `cdo/2.1` modules requires `compiler/pgi/13.` To load
the cdo module, doing either
`module load compiler/pgi/13`
`module load cdo/2.1`
or
`module load compiler/pgi/13 cdo/2.1` (Note the prerequisite module
**must** be first.)
is acceptable.
{{% /panel %}}
{{% panel theme="info" header="Multiple versions" %}}
Some packages list multiple compilers for prerequisites. This means that
the package has been built with each version of the compilers listed.
{{% /panel %}}
{{% panel theme="warning" header="Custom GPU Anaconda Environment" %}}
If you are using custom GPU Anaconda Environment, the only module you need to load is `anaconda`:
`module load anaconda`
{{% /panel %}}
{{< table url="http://crane-head.unl.edu:8192/lmod/spider/json" >}}
+++
title = "Available Software for Rhino"
description = "List of available software for rhino.unl.edu."
scripts = ["https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/jquery.tablesorter.min.js", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-pager.min.js","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-filter.min.js","/js/sort-table.js"]
css = ["http://mottie.github.io/tablesorter/css/theme.default.css","https://mottie.github.io/tablesorter/css/theme.dropbox.css", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/jquery.tablesorter.pager.min.css","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/filter.formatter.min.css"]
+++
{{% notice tip %}}
HCC provides some software packages via the Singularity container
software. If you do not see a desired package in the module list below,
please check the [Using Singularity]({{< relref "using_singularity" >}})
page for the software list there.
{{% /notice %}}
{{% panel theme="warning" header="Module prerequisites" %}}
If a module lists one or more prerequisites, the prerequisite module(s)
must be loaded before or along with, that module.
For example, the `cdo/2.1` modules requires `compiler/pgi/13.` To load
the cdo module, doing either
`module load compiler/pgi/13`
`module load cdo/2.1`
or
`module load compiler/pgi/13 cdo/2.1` (Note the prerequisite module
**must** be first.)
is acceptable.
{{% /panel %}}
{{% panel theme="info" header="Multiple versions" %}}
Some packages list multiple compilers for prerequisites. This means that
the package has been built with each version of the compilers listed.
{{% /panel %}}
{{% panel theme="warning" header="Custom GPU Anaconda Environment" %}}
If you are using custom GPU Anaconda Environment, the only module you need to load is `anaconda`:
`module load anaconda`
{{% /panel %}}
{{< table url="http://rhino-head.unl.edu:8192/lmod/spider/json" >}}
+++
title = "Alignment Tools"
description = "How to use various alignment tools on HCC machines"
weight = "52"
+++
{{% children %}}
\ No newline at end of file
+++
title = "BLAST"
description = "How to use BLAST on HCC machines"
weight = "52"
+++
[BLAST] (https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a local alignment tool that finds similarity between sequences. This tool compares nucleotide or protein sequences to sequence databases, and calculates significance of matches. Sometimes these input sequences are large and using the command-line BLAST is required.
The following pages, [Create Local BLAST Database]({{<relref "create_local_blast_database" >}}) and [Running BLAST Alignment]({{<relref "running_blast_alignment" >}}) describe how to run some of the most common BLAST executables as a single job using the SLURM scheduler on HCC.
### Useful Information
In order to test the BLAST (blast/2.2) performance on Crane, we aligned three nucleotide query datasets, `small.fasta`, `medium.fasta` and `large.fasta`, against the non-redundant nucleotide **nt.fasta** database from NCBI. Some statistics about the query datasets and the time and memory resources used for the alignment are shown on the table below:
{{< readfile file="/static/html/blast.html" >}}
+++
title = "Data Manipulation Tools"
description = "How to use data manipulation tools on HCC machines"
weight = "52"
+++
{{% children %}}
\ No newline at end of file
+++
title = "De Novo Assembly Tools"
description = "How to use de novo assembly tools on HCC machines"
weight = "52"
+++
{{% children %}}
+++
title = "Running Trinity in Multiple Steps"
description = "How to run Trinity in multiple steps on HCC resources"
weight = "10"
+++
## Running Trinity with Paired-End fastq data with 8 CPUs and 100GB of RAM
The first step of running Trinity is to run Trinity with the option **--no_run_chrysalis**:
{{% panel header="`trinity_step1.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Trinity_Step1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Trinity_Step1.%J.out
#SBATCH --error=Trinity_Step1.%J.err
module load trinity/2.6
Trinity --seqType fq --JM 100G --left input_reads_pair_1.fastq --right input_reads_pair_2.fastq --SS_lib_type FR --output trinity_out/ --CPU $SLURM_NTASKS_PER_NODE --no_run_chrysalis
{{< /highlight >}}
{{% /panel %}}
The second step of running Trinity is to run Trinity with the option **--no_run_quantifygraph**:
{{% panel header="`trinity_step2.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Trinity_Step2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Trinity_Step2.%J.out
#SBATCH --error=Trinity_Step2.%J.err
module load trinity/2.6
Trinity --seqType fq --JM 100G --left input_reads_pair_1.fastq --right input_reads_pair_2.fastq --SS_lib_type FR --output trinity_out/ --CPU $SLURM_NTASKS_PER_NODE --no_run_quantifygraph
{{< /highlight >}}
{{% /panel %}}
The third step of running Trinity is to run Trinity with the option **--no_run_butterfly**:
{{% panel header="`trinity_step3.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Trinity_Step3
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Trinity_Step3.%J.out
#SBATCH --error=Trinity_Step3.%J.err
module load trinity/2.6
Trinity --seqType fq --JM 100G --left input_reads_pair_1.fastq --right input_reads_pair_2.fastq --SS_lib_type FR --output trinity_out/ --CPU $SLURM_NTASKS_PER_NODE --no_run_butterfly
{{< /highlight >}}
{{% /panel %}}
The fourth step of running Trinity is to run Trinity without any additional option:
{{% panel header="`trinity_step4.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Trinity_Step4
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Trinity_Step4.%J.out
#SBATCH --error=Trinity_Step4.%J.err
module load trinity/2.6
Trinity --seqType fq --JM 100G --left input_reads_pair_1.fastq --right input_reads_pair_2.fastq --SS_lib_type FR --output trinity_out/ --CPU $SLURM_NTASKS_PER_NODE
{{< /highlight >}}
{{% /panel %}}
### Trinity Output
Trinity outputs number of files in its `trinity_out/` output directory after each executed step. The output file `Trinity.fasta` is the final Trinity output that contains the assembled transcripts.
{{% notice tip %}}
The Inchworm (step 1) and Chrysalis (step 2) steps can be memory intensive. A basic recommendation is to have **1GB of RAM per 1M ~76 base Illumina paired-end reads**.
{{% /notice %}}
+++
title = "Velvet"
description = "How to use Velvet on HCC machines"
weight = "52"
+++
[Velvet] (https://www.ebi.ac.uk/~zerbino/velvet/) is a general sequence assembler designed to produce assembly from short, as well as long reads. Running Velvet consists of a sequence of two commands **velveth** and **velvetg**. **velveth** produces a hash table of k-mers, while **velvetg** constructs the genome assembly. The k-mer length, also known as hash length corresponds to the length, in base pairs, of the words of the reads being hashed.
Velvet has lots of parameters that can be found in its [manual] (https://www.ebi.ac.uk/~zerbino/velvet/Manual.pdf). However, the k-mer value is crucial in obtaining optimal assemblies. Higher k-mer values increase the specificity, and lower k-mer values increase the sensitivity.
Velvet supports multiple file formats: `fasta`, `fastq`, `fasta.gz`, `fastq.gz`, `sam`, `bam`, `eland`, `gerald`. Velvet also supports different read categories for different sequencing technologies and libraries, e.g. `short`, `shortPaired`, `short2`, `shortPaired2`, `long`, `longPaired`.
Each step of Velvet (**velveth** and **velvetg**) may be run as its own job. The following pages describe how to run Velvet in this manner on HCC and provide example submit scripts:
{{% children %}}
### Useful Information
In order to test the Velvet (velvet/1.2) performance on Tusker, we used three paired-end input fastq files, `small_1.fastq` and `small_2.fastq`, `medium_1.fastq` and `medium_2.fastq`, and `large_1.fastq` and `large_2.fastq`. Some statistics about the input files and the time and memory resources used by Velvet on Tusker are shown in the table below:
{{< readfile file="/static/html/velvet.html" >}}
+++
title = "Running Velvet with Paired-End Data"
description = "How to run velvet with paired-end data on HCC resources"
weight = "10"
+++
## Running Velvet with Paired-End long fastq data with k-mer=43, 8 CPUs and 100GB of RAM
The first step of running Velvet is to run **velveth**:
{{% panel header="`velveth.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Velvet_Velveth
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Velveth.%J.out
#SBATCH --error=Velveth.%J.err
module load velvet/1.2
export OMP_NUM_THREADS=$SLURM_NTASKS_PER_NODE
velveth output_directory/ 43 -fastq -longPaired -separate input_reads_pair_1.fastq input_reads_pair_2.fastq
{{< /highlight >}}
{{% /panel %}}
After running **velveth**, the next step is to run **velvetg** on the `output_directory/` and files generated from **velveth**:
{{% panel header="`velvetg.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Velvet_Velvetg
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Velvetg.%J.out
#SBATCH --error=Velvetg.%J.err
module load velvet/1.2
export OMP_NUM_THREADS=$SLURM_NTASKS_PER_NODE
velvetg output_directory/ -min_contig_lgth 200
{{< /highlight >}}
{{% /panel %}}
Both **velveth** and **velvetg** are multi-threaded.
### Velvet Output
{{% panel header="`Output directory after velveth`"%}}
{{< highlight bash >}}
$ ls output_directory/
Log Roadmaps Sequences
{{< /highlight >}}
{{% /panel %}}
{{% panel header="`Output directory after velvetg`"%}}
{{< highlight bash >}}
$ ls output_directory/
contigs.fa Graph LastGraph Log PreGraph Roadmaps Sequences stats.txt
{{< /highlight >}}
{{% /panel %}}
The output fasta file `contigs.fa` is the final Velvet output that contains the assembled contigs. More information about the output files is provided in the Velvet manual.
+++
title = "Running Velvet with Single-End and Paired-End Data"
description = "How to run velvet with single-end and paired-end data on HCC resources"
weight = "10"
+++
## Running Velvet with Single-End and Paired-End short fasta data with k-mer=51, 8 CPUs and 100GB of RAM
The first step of running Velvet is to run **velveth**:
{{% panel header="`velveth.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Velvet_Velveth
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Velveth.%J.out
#SBATCH --error=Velveth.%J.err
module load velvet/1.2
export OMP_NUM_THREADS=$SLURM_NTASKS_PER_NODE
velveth output_directory/ 51 -fasta -short input_reads.fasta -fasta -shortPaired2 -separate input_reads_pair_1.fasta input_reads_pair_2.fasta
{{< /highlight >}}
{{% /panel %}}
After running **velveth**, the next step is to run **velvetg** on the `output_directory/` and files generated from **velveth**:
{{% panel header="`velvetg.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Velvet_Velvetg
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Velvetg.%J.out
#SBATCH --error=Velvetg.%J.err
module load velvet/1.2
export OMP_NUM_THREADS=$SLURM_NTASKS_PER_NODE
velvetg output_directory/ -min_contig_lgth 200
{{< /highlight >}}
{{% /panel %}}
Both **velveth** and **velvetg** are multi-threaded.
### Velvet Output
{{% panel header="`Output directory after velveth`"%}}
{{< highlight bash >}}
$ ls output_directory/
Log Roadmaps Sequences
{{< /highlight >}}
{{% /panel %}}
{{% panel header="`Output directory after velvetg`"%}}
{{< highlight bash >}}
$ ls output_directory/
contigs.fa Graph LastGraph Log PreGraph Roadmaps Sequences stats.txt
{{< /highlight >}}
{{% /panel %}}
The output fasta file `contigs.fa` is the final Velvet output that contains the assembled contigs. More information about the output files is provided in the Velvet manual.
+++
title = "Running Velvet with Single-End Data"
description = "How to run velvet with single-end data on HCC resources"
weight = "10"
+++
## Running Velvet with Single-End short fasta data with k-mer=31, 8 CPUs and 100GB of RAM
The first step of running Velvet is to run **velveth**:
{{% panel header="`velveth.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Velvet_Velveth
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Velveth.%J.out
#SBATCH --error=Velveth.%J.err
module load velvet/1.2
export OMP_NUM_THREADS=$SLURM_NTASKS_PER_NODE
velveth output_directory/ 31 -fasta -short input_reads.fasta
{{< /highlight >}}
{{% /panel %}}
After running **velveth**, the next step is to run **velvetg** on the `output_directory/` and files generated from **velveth**:
{{% panel header="`velvetg.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Velvet_Velvetg
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Velvetg.%J.out
#SBATCH --error=Velvetg.%J.err
module load velvet/1.2
export OMP_NUM_THREADS=$SLURM_NTASKS_PER_NODE
velvetg output_directory/ -min_contig_lgth 200
{{< /highlight >}}
{{% /panel %}}
Both **velveth** and **velvetg** are multi-threaded.
### Velvet Output
{{% panel header="`Output directory after velveth`"%}}
{{< highlight bash >}}
$ ls output_directory/
Log Roadmaps Sequences
{{< /highlight >}}
{{% /panel %}}
{{% panel header="`Output directory after velvetg`"%}}
{{< highlight bash >}}
$ ls output_directory/
contigs.fa Graph LastGraph Log PreGraph Roadmaps Sequences stats.txt
{{< /highlight >}}
{{% /panel %}}
The output fasta file `contigs.fa` is the final Velvet output that contains the assembled contigs. More information about the output files is provided in the Velvet manual.
+++
title = "Downloading SRA data from NCBI"
description = "How to download data from NCBI"
weight = "52"
+++
One way to download high-volume data from NCBI is to use command line
utilities, such as **wget**, **ftp** or Aspera Connect **ascp**
plugin. The Aspera Connect plugin is commonly used high-performance transfer
plugin that provides the best transfer speed.
This plugin is available on our clusters as a module. In order to use it, load the appropriate module first:
{{< highlight bash >}}
$ module load aspera-cli
{{< /highlight >}}
The basic usage of the Aspera plugin is
{{< highlight bash >}}
$ ascp -i $ASPERA_PUBLIC_KEY -k 1 -T -l <max_download_rate_in_Mbps>m anonftp@ftp.ncbi.nlm.nih.gov:/<files_to_transfer> <local_work_output_directory>
{{< /highlight >}}
where **-k 1** enables resume of partial transfers, **-T** disables encryption for maximum throughput, and **-l** sets the transfer rate.
**\<files_to_transfer\>** mentioned in the basic usage of Aspera
plugin has a specifically defined pattern that needs to be followed:
{{< highlight bash >}}
<files_to_transfer> = /sra/sra-instant/reads/ByRun/sra/SRR|ERR|DRR/<first_6_characters_of_accession>/<accession>/<accession>.sra
{{< /highlight >}}
where **SRR\|ERR\|DRR** should be either **SRR**, **ERR **or **DRR** and should match the prefix of the target **.sra** file.
More **ascp** options can be seen by using:
{{< highlight bash >}}
$ ascp --help
{{< /highlight >}}
For example, if you want to download the **SRR304976** file from NCBI in your $WORK **data/** directory with downloading speed of **1000 Mbps**, you should use the following command:
{{< highlight bash >}}
$ ascp -i $ASPERA_PUBLIC_KEY -k 1 -T -l 1000m anonftp@ftp.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByRun/sra/SRR/SRR304/SRR304976/SRR304976.sra /work/[groupname]/[username]/data/
{{< /highlight >}}
+++
title = "Pre-processing Tools"
description = "How to use pre-processing tools on HCC machines"
weight = "52"
+++
{{% children %}}
\ No newline at end of file
+++
title = "Reference-Based Assembly Tools"
description = "How to use reference based assembly tools on HCC machines"
weight = "52"
+++
{{% children %}}
\ No newline at end of file
+++
title = "Tools for Removing/Detecting Redundant Sequences"
description = "How to use tools for removing/detecting redundant sequences on HCC machines"
weight = "52"
+++
{{% children %}}
+++
title = "Module Commands"
description = "How to use the module utility on HCC resources."
+++
`module` commands provide an HPC system user the capability to compile
into their source code using any type of library that is
available on the server. The `module` command gives each user the
ability to modify their environmental `PATH` and `LD_LIBRARY_PATH`
variables.
{{% notice info %}}
Please note that if you compile your application using a particular
module, you must include the appropriate module load statement in your
submit script.
{{% /notice %}}
### List Modules Loaded
{{% panel theme="info" header="Example Usage: module list" %}}
{{< highlight bash >}}
module list
No Modulefiles Currently Loaded.
echo $PATH
/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
{{< /highlight >}}
{{% /panel %}}
### List Modules Available
{{% panel theme="info" header="Example Usage: Listing Available Modules" %}}
{{< highlight bash >}}
module avail
---------------------------------------------- /util/opt/Modules/modulefiles ----------------------------------------------
NCL/6.0 bowtie/2.0.0-beta6 compiler/pgi/12 hdfeos5/1.14 mplus/7.0 szip/2.1
NCL/6.0dist compiler/gcc/4.6 cufflinks/2.0.2 hugeseq/1.0 netcdf/4.1 tophat/2.0.5
NCO/4.1 compiler/gcc/4.7 deprecated intel-mkl/11 netcdf/4.2 udunits/2.1
R/2.15 compiler/intel/11 hdf4/4.2 intel-mkl/12 openmpi/1.5 zlib/1.2
WRF/WRF compiler/intel/12 hdf5/1.8 lsdyna/5.1.1 openmpi/1.6
acml/5.1 compiler/open64/4.5 hdf5/1.8.6 lsdyna/6.0.0 samtools/0.1
bowtie/0.12.8 compiler/pgi/11 hdfeos2/2.18 mplus/6.12 sas/9.3
{{< /highlight >}}
{{% /panel %}}
#### module load \<module-name\>
Places the binaries and libraries for \<module-name\> into your `PATH` and `LD_LIBRARY_PATH`.
{{% panel theme="info" header="Example Usage: Loading Desired Module" %}}
{{< highlight bash >}}
module load compiler/pgi/11
module list
Currently Loaded Modulefiles:
1) compiler/pgi/11
echo $PATH
/util/comp/pgi/linux86-64/11/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
{{< /highlight >}}
{{% /panel %}}
#### module unload \<module-name\>
Removes the binaries and libraries associated with \<module-name\> from your PATH and `LD_LIBRARY_PATH`.
{{% panel theme="info" header="Example Usage: module unload" %}}
{{< highlight bash >}}
module unload compiler/pgi/11
module list
No Modulefiles Currently Loaded.
echo $PATH
/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
{{< /highlight >}}
{{% /panel %}}
#### module purge
**Purges** all previously **loaded** module libraries and binaries from
your `PATH` and `LD_LIBRARY_PATH`.
{{% panel theme="info" header="Example Usage: module purge" %}}
{{< highlight bash >}}
module load compiler/open64
module load zlib/1.2
module list
Currently Loaded Modulefiles:
1) zlib/1.2 2) compiler/open64/4.5
module purge
module list
No Modulefiles Currently Loaded.
{{< /highlight >}}
{{% /panel %}}
#### module help
To see a complete list of module commands/options.
**Example Usage: module help**
{{% panel theme="info" header="Example Usage: module help" %}}
{{< highlight bash >}}
Usage: module [options] sub-command [args ...]
Options:
-h -? -H --help This help message
-s availStyle --style=availStyle Site controlled avail style: system en_grouped (default: en_grouped)
--regression_testing Lmod regression testing
-D Program tracing written to stderr
--debug=dbglvl Program tracing written to stderr
--pin_versions=pinVersions When doing a restore use specified version, do not follow defaults
-d --default List default modules only when used with avail
-q --quiet Do not print out warnings
--expert Expert mode
-t --terse Write out in machine readable format for commands: list, avail, spider, savelist
--initial_load loading Lmod for first time in a user shell
--latest Load latest (ignore default)
--ignore_cache Treat the cache file(s) as out-of-date
--novice Turn off expert and quiet flag
--raw Print modulefile in raw output when used with show
-w twidth --width=twidth Use this as max term width
-v --version Print version info and quit
-r --regexp use regular expression match
--gitversion Dump git version in a machine readable way and quit
--dumpversion Dump version in a machine readable way and quit
--check_syntax --checkSyntax Checking module command syntax: do not load
--config Report Lmod Configuration
--config_json Report Lmod Configuration in json format
--mt Report Module Table State
--timer report run times
--force force removal of a sticky module or save an empty collection
--redirect Send the output of list, avail, spider to stdout (not stderr)
--no_redirect Force output of list, avail and spider to stderr
--show_hidden Avail and spider will report hidden modules
--spider_timeout=timeout a timeout for spider
-T --trace
module [options] sub-command [args ...]
Help sub-commands:
------------------
help prints this message
help module [...] print help message from module(s)
Loading/Unloading sub-commands:
-------------------------------
load | add module [...] load module(s)
try-load | try-add module [...] Add module(s), do not complain if not found
del | unload module [...] Remove module(s), do not complain if not found
swap | sw | switch m1 m2 unload m1 and load m2
purge unload all modules
refresh reload aliases from current list of modules.
update reload all currently loaded modules.
Listing / Searching sub-commands:
---------------------------------
list List loaded modules
list s1 s2 ... List loaded modules that match the pattern
avail | av List available modules
avail | av string List available modules that contain "string".
spider List all possible modules
spider module List all possible version of that module file
spider string List all module that contain the "string".
spider name/version Detailed information about that version of the module.
whatis module Print whatis information about module
keyword | key string Search all name and whatis that contain "string".
Searching with Lmod:
--------------------
All searching (spider, list, avail, keyword) support regular expressions:
spider -r '^p' Finds all the modules that start with `p' or `P'
spider -r mpi Finds all modules that have "mpi" in their name.
spider -r 'mpi$ Finds all modules that end with "mpi" in their name.
Handling a collection of modules:
--------------------------------
save | s Save the current list of modules to a user defined "default" collection.
save | s name Save the current list of modules to "name" collection.
reset The same as "restore system"
restore | r Restore modules from the user's "default" or system default.
restore | r name Restore modules from "name" collection.
restore system Restore module state to system defaults.
savelist List of saved collections.
describe | mcc name Describe the contents of a module collection.
Deprecated commands:
--------------------
getdefault [name] load name collection of modules or user's "default" if no name given.
===> Use "restore" instead <====
setdefault [name] Save current list of modules to name if given, otherwise save as the default list for you the user.
===> Use "save" instead. <====
Miscellaneous sub-commands:
---------------------------
show modulefile show the commands in the module file.
use [-a] path Prepend or Append path to MODULEPATH.
unuse path remove path from MODULEPATH.
tablelist output list of active modules as a lua table.
Important Environment Variables:
--------------------------------
LMOD_COLORIZE If defined to be "YES" then Lmod prints properties and warning in color.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Lmod Web Sites
Documentation: http://lmod.readthedocs.org
Github: https://github.com/TACC/Lmod
Sourceforge: https://lmod.sf.net
TACC Homepage: https://www.tacc.utexas.edu/research-development/tacc-projects/lmod
To report a bug please read http://lmod.readthedocs.io/en/latest/075_bug_reporting.html
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Modules based on Lua: Version 7.4.16 2017-05-23 11:10 -05:00
by Robert McLay mclay@tacc.utexas.edu
{{< /highlight >}}
{{% /panel %}}
+++
title = "MPI Jobs on HCC"
description = "How to compile and run MPI programs on HCC machines"
weight = "52"
+++
This quick start demonstrates how to implement a parallel (MPI)
Fortran/C program on HCC supercomputers. The sample codes and submit
scripts can be downloaded from [mpi_dir.zip](/attachments/mpi_dir.zip).
#### Login to a HCC Cluster
Log in to a HCC cluster through PuTTY ([For Windows Users]({{< relref "/quickstarts/connecting/for_windows_users">}})) or Terminal ([For Mac/Linux
Users]({{< relref "/quickstarts/connecting/for_maclinux_users">}})) and make a subdirectory called `mpi_dir` under the `$WORK` directory.
{{< highlight bash >}}
$ cd $WORK
$ mkdir mpi_dir
{{< /highlight >}}
In the subdirectory `mpi_dir`, save all the relevant codes. Here we
include two demo programs, `demo_f_mpi.f90` and `demo_c_mpi.c`, that
compute the sum from 1 to 20 through parallel processes. A
straightforward parallelization scheme is used for demonstration
purpose. First, the master core (i.e. `myid=0`) distributes equal
computation workload to a certain number of cores (as specified by
`--ntasks `in the submit script). Then, each worker core computes a
partial summation as output. Finally, the master core collects the
outputs from all worker cores and perform an overall summation. For easy
comparison with the serial code ([Fortran/C on HCC]({{< relref "fortran_c_on_hcc">}})), the
added lines in the parallel code (MPI) are marked with "!=" or "//=".
{{%expand "demo_f_mpi.f90" %}}
{{< highlight fortran >}}
Program demo_f_mpi
!====== MPI =====
use mpi
!================
implicit none
integer, parameter :: N = 20
real*8 w
integer i
common/sol/ x
real*8 x
real*8, dimension(N) :: y
!============================== MPI =================================
integer ind
real*8, dimension(:), allocatable :: y_local
integer numnodes,myid,rc,ierr,start_local,end_local,N_local
real*8 allsum
!====================================================================
!============================== MPI =================================
call mpi_init( ierr )
call mpi_comm_rank ( mpi_comm_world, myid, ierr )
call mpi_comm_size ( mpi_comm_world, numnodes, ierr )
!
N_local = N/numnodes
allocate ( y_local(N_local) )
start_local = N_local*myid + 1
end_local = N_local*myid + N_local
!====================================================================
do i = start_local, end_local
w = i*1d0
call proc(w)
ind = i - N_local*myid
y_local(ind) = x
! y(i) = x
! write(6,*) 'i, y(i)', i, y(i)
enddo
! write(6,*) 'sum(y) =',sum(y)
!============================================== MPI =====================================================
call mpi_reduce( sum(y_local), allsum, 1, mpi_real8, mpi_sum, 0, mpi_comm_world, ierr )
call mpi_gather ( y_local, N_local, mpi_real8, y, N_local, mpi_real8, 0, mpi_comm_world, ierr )
if (myid == 0) then
write(6,*) '-----------------------------------------'
write(6,*) '*Final output from... myid=', myid
write(6,*) 'numnodes =', numnodes
write(6,*) 'mpi_sum =', allsum
write(6,*) 'y=...'
do i = 1, N
write(6,*) y(i)
enddo
write(6,*) 'sum(y)=', sum(y)
endif
deallocate( y_local )
call mpi_finalize(rc)
!========================================================================================================
Stop
End Program
Subroutine proc(w)
real*8, intent(in) :: w
common/sol/ x
real*8 x
x = w
Return
End Subroutine
{{< /highlight >}}
{{% /expand %}}
{{%expand "demo_c_mpi.c" %}}
{{< highlight c >}}
//demo_c_mpi
#include <stdio.h>
//======= MPI ========
#include "mpi.h"
#include <stdlib.h>
//====================
double proc(double w){
double x;
x = w;
return x;
}
int main(int argc, char* argv[]){
int N=20;
double w;
int i;
double x;
double y[N];
double sum;
//=============================== MPI ============================
int ind;
double *y_local;
int numnodes,myid,rc,ierr,start_local,end_local,N_local;
double allsum;
//================================================================
//=============================== MPI ============================
MPI_Init(&argc, &argv);
MPI_Comm_rank( MPI_COMM_WORLD, &myid );
MPI_Comm_size ( MPI_COMM_WORLD, &numnodes );
N_local = N/numnodes;
y_local=(double *) malloc(N_local*sizeof(double));
start_local = N_local*myid + 1;
end_local = N_local*myid + N_local;
//================================================================
for (i = start_local; i <= end_local; i++){
w = i*1e0;
x = proc(w);
ind = i - N_local*myid;
y_local[ind-1] = x;
// y[i-1] = x;
// printf("i,x= %d %lf\n", i, y[i-1]) ;
}
sum = 0e0;
for (i = 1; i<= N_local; i++){
sum = sum + y_local[i-1];
}
// printf("sum(y)= %lf\n", sum);
//====================================== MPI ===========================================
MPI_Reduce( &sum, &allsum, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD );
MPI_Gather( &y_local[0], N_local, MPI_DOUBLE, &y[0], N_local, MPI_DOUBLE, 0, MPI_COMM_WORLD );
if (myid == 0){
printf("-----------------------------------\n");
printf("*Final output from... myid= %d\n", myid);
printf("numnodes = %d\n", numnodes);
printf("mpi_sum = %lf\n", allsum);
printf("y=...\n");
for (i = 1; i <= N; i++){
printf("%lf\n", y[i-1]);
}
sum = 0e0;
for (i = 1; i<= N; i++){
sum = sum + y[i-1];
}
printf("sum(y) = %lf\n", sum);
}
free( y_local );
MPI_Finalize ();
//======================================================================================
return 0;
}
{{< /highlight >}}
{{% /expand %}}
---
#### Compiling the Code
The compiling of a MPI code requires first loading a compiler "engine"
such as `gcc`, `intel`, or `pgi` and then loading a MPI wrapper
`openmpi`. Here we will use the GNU Complier Collection, `gcc`, for
demonstration.
{{< highlight bash >}}
$ module load compiler/gcc/6.1 openmpi/2.1
$ mpif90 demo_f_mpi.f90 -o demo_f_mpi.x
$ mpicc demo_c_mpi.c -o demo_c_mpi.x
{{< /highlight >}}
The above commends load the `gcc` complier with the `openmpi` wrapper.
The compiling commands `mpif90` or `mpicc` are used to compile the codes
to`.x` files (executables).
### Creating a Submit Script
Create a submit script to request 5 cores (with `--ntasks`). A parallel
execution command `mpirun ./` needs to enter to last line before the
main program name.
{{% panel header="`submit_f.mpi`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --ntasks=5
#SBATCH --mem-per-cpu=1024
#SBATCH --time=00:01:00
#SBATCH --job-name=Fortran
#SBATCH --error=Fortran.%J.err
#SBATCH --output=Fortran.%J.out
mpirun ./demo_f_mpi.x
{{< /highlight >}}
{{% /panel %}}
{{% panel header="`submit_c.mpi`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --ntasks=5
#SBATCH --mem-per-cpu=1024
#SBATCH --time=00:01:00
#SBATCH --job-name=C
#SBATCH --error=C.%J.err
#SBATCH --output=C.%J.out
mpirun ./demo_c_mpi.x
{{< /highlight >}}
{{% /panel %}}
#### Submit the Job
The job can be submitted through the command `sbatch`. The job status
can be monitored by entering `squeue` with the `-u` option.
{{< highlight bash >}}
$ sbatch submit_f.mpi
$ sbatch submit_c.mpi
$ squeue -u <username>
{{< /highlight >}}
Replace `<username>` with your HCC username.
Sample Output
-------------
The sum from 1 to 20 is computed and printed to the `.out` file (see
below). The outputs from the 5 cores are collected and processed by the
master core (i.e. `myid=0`).
{{%expand "Fortran.out" %}}
{{< highlight batchfile>}}
-----------------------------------------
*Final output from... myid= 0
numnodes = 5
mpi_sum = 210.00000000000000
y=...
1.0000000000000000
2.0000000000000000
3.0000000000000000
4.0000000000000000
5.0000000000000000
6.0000000000000000
7.0000000000000000
8.0000000000000000
9.0000000000000000
10.000000000000000
11.000000000000000
12.000000000000000
13.000000000000000
14.000000000000000
15.000000000000000
16.000000000000000
17.000000000000000
18.000000000000000
19.000000000000000
20.000000000000000
sum(y)= 210.00000000000000
{{< /highlight >}}
{{% /expand %}}
{{%expand "C.out" %}}
{{< highlight batchfile>}}
-----------------------------------
*Final output from... myid= 0
numnodes = 5
mpi_sum = 210.000000
y=...
1.000000
2.000000
3.000000
4.000000
5.000000
6.000000
7.000000
8.000000
9.000000
10.000000
11.000000
12.000000
13.000000
14.000000
15.000000
16.000000
17.000000
18.000000
19.000000
20.000000
sum(y) = 210.000000
{{< /highlight >}}
{{% /expand %}}
+++
title = "Running OLAM at HCC"
description = "How to run the OLAM (Ocean Land Atmosphere Model) on HCC resources."
+++
### OLAM compilation on Tusker
##### pgi/11 compilation with mpi and openmp enabled
1. Load modules:
{{< highlight bash >}}
module load compiler/pgi/11 openmpi/1.6 szip/2.1 zlib/1.2 NCL/6.1dist
{{< /highlight >}}
2. Edit the `include.mk` file.
{{% panel theme="info" header="include.mk" %}}
{{< highlight batch >}}
#----------------- LINUX Intel Fortran ifort/gcc ---------------
F_COMP=mpif90
# If the compiler supports (and the user wants to use)
# the module IEEE_ARITHMETIC, uncomment below
IEEE_ARITHMETIC=yes
# If using MPI libraries:
OLAM_MPI=yes
# If parallel hdf5 is supported, uncomment the next line
OLAM_PARALLEL_HDF5=yes
# If you use the ED2 model, uncomment the next line
#USE_ED2=yes
MPI_PATH=/util/opt/openmpi/1.6/pgi/11
PAR_INCS=-I$(MPI_PATH)/include:$(MPI_PATH)/lib
PAR_LIBS=-L$(MPI_PATH)/lib -lmpi
# OPTIMIZED:
F_OPTS=-O3 -traceback -mp
#F_OPTS=-xHost -O3 -fno-alias -ip -openmp -traceback
#F_OPTS=-g -O3 -xHost -traceback
# DEBUG:
#F_OPTS=-g -fp-model precise -check bounds -traceback \
# -debug extended -check uninit -ftrapuv
# FORTRAN FLAGS FOR BIG FILES WHICH WOULD HAVE EXCESSIVE COMPILATION TIME
#SLOW_FFLAGS=-O1 -g -no-ip -traceback
C_COMP=mpicc
#C_COMP=mpicc
C_OPTS=-DUNDERSCORE -DLITTLE
NCARG_DIR=/util/src/ncl_ncarg/ncl_ncarg-6.1.2/lib
LIBNCARG=-L$(NCARG_DIR) -lncarg -lncarg_gks -lncarg_c \
-L/usr/lib64 -lX11 -ldl -lpthread -lgfortran -lcairo
HDF5_LIBS=-L/util/opt/hdf5/1.8.13/openmpi/1.6/pgi/11/lib -lhdf5_fortran -lhdf5 -lz -lm
HDF5_INCS=-I/util/opt/hdf5/1.8.13/openmpi/1.6/pgi/11/include
NETCDF_LIBS=-L/util/opt/netcdf/4.2/pgi/11/lib -lnetcdf
NETCDF_INCS=-I/util/opt/netcdf/4.2/pgi/11/include
LOADER=$(F_COMP)
LOADER_OPTS=-mp
#LOADER_OPTS=-static-intel $(F_OPTS)
# For Apple OSX: the stack size needs to be increased at link time
# LOADER_OPTS=-static-intel $(F_OPTS) -Wl,-stack_size -Wl,0x10000000
# to allow ifort compiler to link with pg-compiled ncar graphics:
# LIBS=-z muldefs -L/opt/pgi/linux86-64/5.2/lib -lpgftnrtl -lpgc
## IMPORTANT: Need to specify this flag in ED2
#USE_HDF5=1
{{< /highlight >}}
{{% /panel %}}
3. Command: `make clean`
4. Command: `make -j 8`
##### intel/12 compilation with mpi and openmp enabled
1. Load modules:
{{< highlight bash >}}
module load compiler/intel/12 openmpi/1.6 szip/2.1 zlib/1.2
{{< /highlight >}}
2. Edit the `include.mk` file.
{{% panel theme="info" header="include.mk" %}}
{{< highlight batch >}}
#----------------- LINUX Intel Fortran ifort/gcc ---------------
F_COMP=mpif90
# If the compiler supports (and the user wants to use)
# the module IEEE_ARITHMETIC, uncomment below
IEEE_ARITHMETIC=yes
# If using MPI libraries:
OLAM_MPI=yes
# If parallel hdf5 is supported, uncomment the next line
OLAM_PARALLEL_HDF5=yes
# If you use the ED2 model, uncomment the next line
#USE_ED2=yes
MPI_PATH=/util/opt/openmpi/1.6/intel/12
PAR_INCS=-I$(MPI_PATH)/include:$(MPI_PATH)/lib
PAR_LIBS=-L$(MPI_PATH)/lib -lmpi
# OPTIMIZED:
F_OPTS=-O3 -traceback -openmp
#F_OPTS=-xHost -O3 -fno-alias -ip -openmp -traceback
#F_OPTS=-g -O3 -xHost -traceback
# DEBUG:
#F_OPTS=-g -fp-model precise -check bounds -traceback \
# -debug extended -check uninit -ftrapuv
# FORTRAN FLAGS FOR BIG FILES WHICH WOULD HAVE EXCESSIVE COMPILATION TIME
#SLOW_FFLAGS=-O1 -g -no-ip -traceback
C_COMP=mpicc
#C_COMP=mpicc
C_OPTS=-DUNDERSCORE -DLITTLE
NCARG_DIR=/util/src/ncl_ncarg/ncl_ncarg-6.1.2/lib
LIBNCARG=-L$(NCARG_DIR) -lncarg -lncarg_gks -lncarg_c \
-L/usr/lib64 -lX11 -ldl -lpthread -lgfortran -lcairo
HDF5_LIBS=-L/util/opt/hdf5/1.8.13/openmpi/1.6/intel/12/lib -lhdf5_fortran -lhdf5 -lz -lm
HDF5_INCS=-I/util/opt/hdf5/1.8.13/openmpi/1.6/intel/12/include
NETCDF_LIBS=-L/util/opt/netcdf/4.2/intel/12/lib -lnetcdf
NETCDF_INCS=-I/util/opt/netcdf/4.2/intel/12/include
LOADER=$(F_COMP)
LOADER_OPTS=-openmp
#LOADER_OPTS=-static-intel $(F_OPTS)
# For Apple OSX: the stack size needs to be increased at link time
# LOADER_OPTS=-static-intel $(F_OPTS) -Wl,-stack_size -Wl,0x10000000
# to allow ifort compiler to link with pg-compiled ncar graphics:
# LIBS=-z muldefs -L/opt/pgi/linux86-64/5.2/lib -lpgftnrtl -lpgc
## IMPORTANT: Need to specify this flag in ED2
#USE_HDF5=1
{{< /highlight >}}
{{% /panel %}}
3. Command: `make clean`
4. Command: `make -j 8`
### OLAM compilation on Crane
##### Intel/15 compiler with OpenMPI/1.10
1. Load modules:
{{< highlight bash >}}
module load compiler/intel/15 openmpi/1.10 NCL/6.1 netcdf/4.4 phdf5/1.8 szip/2.1 zlib/1.2
{{< /highlight >}}
2. Edit the `include.mk` file:
{{% panel theme="info" header="include.mk" %}}
{{< highlight batch >}}
#----------------- LINUX Intel Fortran ifort/gcc ---------------
F_COMP=/util/opt/hdf5/1.8/openmpi/1.10/intel/15/bin/h5pfc
# If the compiler supports (and the user wants to use)
# the module IEEE_ARITHMETIC, uncomment below
IEEE_ARITHMETIC=yes
# If using MPI libraries:
OLAM_MPI=yes
# If parallel hdf5 is supported, uncomment the next line
OLAM_PARALLEL_HDF5=yes
# If you use the ED2 model, uncomment the next line
#USE_ED2=yes
#MPI_PATH=/usr/local/mpich
PAR_INCS=-I/util/opt/openmpi/1.10/intel/15/include
PAR_LIBS=-L/util/opt/openmpi/1.10/intel/15/lib
# OPTIMIZED:
F_OPTS=-xHost -O3 -fno-alias -ip -openmp -traceback
#F_OPTS=-g -O3 -xHost -traceback
# DEBUG:
#F_OPTS=-g -fp-model precise -check bounds -traceback \
# -debug extended -check uninit -ftrapuv
# EXTRA OPTIONS FOR FIXED-SOURCE CODE
FIXED_SRC_FLAGS=-fixed -132
# FORTRAN FLAGS FOR BIG FILES WHICH WOULD HAVE EXCESSIVE COMPILATION TIME
SLOW_FFLAGS=-O1 -g -no-ip -traceback
#C_COMP=icc
C_COMP=mpicc
C_OPTS=-O3 -DUNDERSCORE -DLITTLE
NCARG_DIR=/util/opt/NCL/6.1/lib
LIBNCARG=-L$(NCARG_DIR) -lncarg -lncarg_gks -lncarg_c \
-L/usr/lib64 -lX11 -ldl -lpng -lpthread -lgfortran -lcairo
HDF5_LIBS=-L/util/opt/hdf5/1.8/openmpi/1.10/intel/15/lib
HDF5_INCS=-I/util/opt/hdf5/1.8/openmpi/1.10/intel/15/include
NETCDF_LIBS=-L/util/opt/netcdf/4.4/intel/15/lib -lnetcdf
NETCDF_INCS=-I/util/opt/netcdf/4.4/intel/15/include
LOADER=$(F_COMP)
LOADER_OPTS=-static-intel $(F_OPTS)
# For Apple OSX: the stack size needs to be increased at link time
# LOADER_OPTS=-static-intel $(F_OPTS) -Wl,-stack_size -Wl,0x10000000
# to allow ifort compiler to link with pg-compiled ncar graphics:
# LIBS=-z muldefs -L/opt/pgi/linux86-64/5.2/lib -lpgftnrtl -lpgc
## IMPORTANT: Need to specify this flag in ED2
USE_HDF5=1
{{< /highlight >}}
{{% /panel %}}
3. Command: `make clean`
4. Command: `make -j 8`
### Sample SLURM submit scripts
##### PGI compiler:
{{% panel theme="info" header="Sample submit script for PGI compiler" %}}
{{< highlight batch >}}
#!/bin/sh
#SBATCH --ntasks=8 # 8 cores
#SBATCH --mem-per-cpu=1024 # Minimum memory required per CPU (in megabytes)
#SBATCH --time=03:15:00 # Run time in hh:mm:ss
#SBATCH --error=/work/[groupname]/[username]/job.%J.err
#SBATCH --output=/work/[groupname]/[username]/job.%J.out
module load compiler/pgi/11 openmpi/1.6 szip/2.1 zlib/1.2
mpirun /path/to/olam-4.2c-mpi
{{< /highlight >}}
{{% /panel %}}
##### Intel compiler:
{{% panel theme="info" header="Sample submit script for Intel compiler" %}}
{{< highlight batch >}}
#!/bin/sh
#SBATCH --ntasks=8 # 8 cores
#SBATCH --mem-per-cpu=1024 # Minimum memory required per CPU (in megabytes)
#SBATCH --time=03:15:00 # Run time in hh:mm:ss
#SBATCH --error=/work/[groupname]/[username]/job.%J.err
#SBATCH --output=/work/[groupname]/[username]/job.%J.out
module load compiler/intel/12 openmpi/1.6 szip/2.1 zlib/1.2
mpirun /path/to/olam-4.2c-mpi
{{< /highlight >}}
{{% /panel %}}
+++
title = "Running Theano"
description = "How to run the Theano on HCC resources."
+++
Theano is available on HCC resources via the modules system. Both CPU and GPU
versions are available on Crane. Additionally, installs for both Python
2.7 and 3.6 are provided.
### Initial Setup
Theano attempts to write to a `~/.theano` directory in some
circumstances, which can cause errors as the `/home` filesystem is
read-only on HCC machines. As a workaround, create the directory on
`/work` and make a symlink from `/home`:
{{% panel theme="info" header="Create & symlink .theano directory" %}}
{{< highlight bash >}}
mkdir -p $WORK/.theano
ln -s $WORK/.theano $HOME/.theano
{{< /highlight >}}
{{% /panel %}}
This only needs to be done once on each HCC machine.
### Running the CPU version
To use the CPU version, simply load the module and run your Python code.
You can choose between the Python 2.7, 3.5 or 3.6 environments:
{{% panel theme="info" header="Python 2.7 version" %}}
{{< highlight bash >}}
module load theano/py27/1.0
python my_python2_script.py
{{< /highlight >}}
{{% /panel %}}
or
{{% panel theme="info" header="Python 3.5 version" %}}
{{< highlight bash >}}
module load theano/py35/1.0
python my_python3_script.py
{{< /highlight >}}
{{% /panel %}}
or
{{% panel theme="info" header="Python 3.6 version" %}}
{{< highlight bash >}}
module load theano/py36/1.0
python my_python3_script.py
{{< /highlight >}}
{{% /panel %}}
### Running the GPU version
To use the GPU version, first create a `~/.theanorc` file with the
following contents (or append to an existing file as needed):
{{% panel theme="info" header="~/.theanorc" %}}
{{< highlight batch >}}
[global]
device = cuda
{{< /highlight >}}
{{% /panel %}}
Next, load the theano module:
{{% panel theme="info" header="Load the theano module" %}}
{{< highlight bash >}}
module load theano/py27/0.9
{{< /highlight >}}
{{% /panel %}}
To test the GPU support, start an interactive job on a GPU node and
import the theano module within the Python interpreter. You should see
output similar to the following:
{{% panel theme="info" header="GPU support test" %}}
{{< highlight python >}}
Python 2.7.15 | packaged by conda-forge | (default, May 8 2018, 14:46:53)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import theano
Using cuDNN version 7005 on context None
Mapped name None to device cuda: Tesla K20m (0000:03:00.0)
{{< /highlight >}}
{{% /panel %}}
+++
title = "Using Anaconda Package Manager"
description = "How to use the Anaconda Package Manager on HCC resources."
+++
[Anaconda](https://www.anaconda.com/what-is-anaconda),
from [Anaconda, Inc](https://www.anaconda.com)
is a completely free enterprise-ready distribution for large-scale data
processing, predictive analytics, and scientific computing. It includes
over 195 of the most popular Python packages for science, math,
engineering, and data analysis. **It also offers the ability to easily
create custom _environments_ by mixing and matching different versions
of Python and/or R and other packages into isolated environments that
individual users are free to create.** Anaconda includes the `conda`
package and environment manager to make managing these environments
straightforward.
- [Using Anaconda](#using-anaconda)
- [Creating custom Anaconda Environment](#creating-custom-anaconda-environment)
- [Creating custom GPU Anaconda Environment](#creating-custom-gpu-anaconda-environment)
- [Adding Packages to an Existing Environment](#adding-packages-to-an-existing-environment)
- [Using an Anaconda Environment in a Jupyter Notebook on Crane](#using-an-anaconda-environment-in-a-jupyter-notebook-on-crane)
### Using Anaconda
While the standard methods of installing packages via `pip`
and `easy_install` work with Anaconda, the preferred method is using
the `conda` command.
{{% notice info %}}
Full documentation on using Conda is available
at http://conda.pydata.org/docs/
A [cheatsheet](/attachments/11635089.pdf) is also provided.
{{% /notice %}}
A few examples of the basic commands are provided here. For a full
explanation of all of Anaconda/Conda's capabilities, see the
documentation linked above.
Anaconda is provided through the `anaconda` module on HCC machines. To
begin using it, load the Anaconda module.
{{% panel theme="info" header="Load the Anaconda module to start using Conda" %}}
{{< highlight bash >}}
module load anaconda
{{< /highlight >}}
{{% /panel %}}
To display general information about Conda/Anaconda, use the `info` subcommand.
{{% panel theme="info" header="Display general information about Conda/Anaconda" %}}
{{< highlight bash >}}
conda info
{{< /highlight >}}
{{% /panel %}}
Conda allows the easy creation of isolated, custom environments with
packages and versions of your choosing. To show all currently available
environments, and which is active, use the `info `subcommand with the
`-e` option.
{{% panel theme="info" header="List available environments" %}}
{{< highlight bash >}}
conda info -e
{{< /highlight >}}
{{% /panel %}}
The active environment will be marked with an asterisk (\*) character.
The `list` command will show all packages installed
in the currently active environment.
{{% panel theme="info" header="List installed packages in current environment" %}}
{{< highlight bash >}}
conda list
{{< /highlight >}}
{{% /panel %}}
To find the names of packages, use the `search` subcommand.
{{% panel theme="info" header="Search for packages" %}}
{{< highlight bash >}}
conda search numpy
{{< /highlight >}}
{{% /panel %}}
If the package is available, this will also display available package
versions and compatible Python versions the package may be installed
under.
### Creating Custom Anaconda Environment
The `create` command is used to create a new environment. It requires
at a minimum a name for the environment, and at least one package to
install. For example, suppose we wish to create a new environment, and
need version 1.8 of NumPy.
{{% notice info %}}
The `conda create` command must be run on the login node.
{{% /notice %}}
{{% panel theme="info" header="Create a new environment by providing a name and package specification" %}}
{{< highlight bash >}}
conda create -n mynumpy numpy=1.8
{{< /highlight >}}
{{% /panel %}}
This will create a new environment called 'mynumpy' and installed NumPy
version 1.8, along with any required dependencies.
To use the environment, we must first *activate* it.
{{% panel theme="info" header="Activate environment" %}}
{{< highlight bash >}}
source activate mynumpy
{{< /highlight >}}
{{% /panel %}}
Our new environment is now active, and we can use it. The shell prompt
will change to indicate this as well (this can be disable if desired).
### Creating Custom GPU Anaconda Environment
We provide GPU versions of various frameworks such as `tensorflow`, `keras`, `theano`, via [modules](../module_commands). However, sometimes you may need additional libraries or packages that are not available as part of these modules. In this case, you will need to create your own GPU Anaconda environment.
To do this, you need to first clone one of our GPU modules to a new Anaconda environment, and then install the desired packages in this new environment.
The reason for this is that the GPU modules we support are built using the specific CUDA drivers our GPU nodes have. If you just create custom GPU environment without cloning the module, your code will not utilize the GPUs.
For example, if you want to use `tensorflow` with additional packages, first do:
{{% panel theme="info" header="Cloning GPU module to a new Anaconda environment" %}}
{{< highlight bash >}}
module load tensorflow-gpu/py36/1.12 anaconda
conda create -n tensorflow-gpu-1.12-custom --clone $CONDA_DEFAULT_ENV
module purge
{{< /highlight >}}
{{% /panel %}}
This will create a new `tensorflow-gpu-1.12-custom` environment in your home directory that is a copy of the `tensorflow-gpu` module. Then, you can install the additional packages you need in this environment.
{{% panel theme="info" header="Install new packages in the currently active environment" %}}
{{< highlight bash >}}
module load anaconda
source activate tensorflow-gpu-1.12-custom
conda install <packages>
{{< /highlight >}}
{{% /panel %}}
Next, whenever you want to use this custom GPU Anaconda environment, you need to add these two lines in your submit script:
{{< highlight bash >}}
module load anaconda
source activate tensorflow-gpu-1.12-custom
{{< /highlight >}}
{{% notice info %}}
If you have custom GPU Anaconda environment please only use the two lines from above and **DO NOT** load the module you have cloned earlier. Using `module load tensorflow-gpu/py36/1.12` and `source activate tensorflow-gpu-1.12-custom` in the same script is **wrong** and may give you various errors and incorrect results.
{{% /notice %}}
### Adding Packages to an Existing Environment
To install additional packages in an environment, use the `install`
subcommand. Suppose we want to install iPython in our 'mynumpy'
environment. While the environment is active, use `install `with no
additional arguments.
{{% panel theme="info" header="Install a new package in the currently active environment" %}}
{{< highlight bash >}}
conda install ipython
{{< /highlight >}}
{{% /panel %}}
{{% notice info %}}
The `conda install` command must be run on the login node.
{{% /notice %}}
If you aren't currently in the environment you wish to install the
package in, add the `-n `option to specify the name.
{{% panel theme="info" header="Install new packages in a specified environment" %}}
{{< highlight bash >}}
conda install -n mynumpy ipython
{{< /highlight >}}
{{% /panel %}}
The `remove` subcommand to uninstall a package functions similarly.
{{% panel theme="info" header="Remove package from currently active environment" %}}
{{< highlight bash >}}
conda remove ipython
{{< /highlight >}}
{{% /panel %}}
{{% panel theme="info" header="Remove package from environment specified by name" %}}
{{< highlight bash >}}
conda remove -n mynumpy ipython
{{< /highlight >}}
{{% /panel %}}
To exit an environment, we *deactivate* it.
{{% panel theme="info" header="Exit current environment" %}}
Newer versions of anaconda:
{{< highlight bash >}}
conda deactivate
{{< /highlight >}}
Older versions of anaconda:
{{< highlight bash >}}
source deactivate
{{< /highlight >}}
{{% /panel %}}
Finally, to completely remove an environment, add the `--all `option
to `remove`.
{{% panel theme="info" header="Completely remove an environment" %}}
{{< highlight bash >}}
conda remove -n mynumpy --all
{{< /highlight >}}
{{% /panel %}}
### Using an Anaconda Environment in a Jupyter Notebook on Crane
It is not difficult to make an Anaconda environment available to a
Jupyter Notebook. To do so, follow the steps below, replacing
`myenv` with the name of the Python or R environment you wish to use:
1. Stop any running Jupyter Notebooks and ensure you are logged out of
the JupyterHub instance at https://crane.unl.edu
1. If you are not logged out, please click the Control Panel button
located in the top right corner.
2. Click the "Stop My Server" Button to terminate the Jupyter
server.
3. Click the logout button in the top right corner.
2. Using the command-line environment, load the target conda
environment:
{{< highlight bash >}}source activate myenv{{< /highlight >}}
3. Install the Jupyter kernel and add the environment:
1. For a **Python** conda environment, install the IPykernel
package, and then the kernel specification:
{{< highlight bash >}}
# Install ipykernel
conda install ipykernel
# Install the kernel specification
python -m ipykernel install --user --name "$CONDA_DEFAULT_ENV" --display-name "Python ($CONDA_DEFAULT_ENV)"
{{< /highlight >}}
2. For an **R** conda environment, install the jupyter\_client and
IRkernel packages, and then the kernel specification:
{{< highlight bash >}}
# Install PNG support for R, the R kernel for Jupyter, and the Jupyter client
conda install r-png
conda install r-irkernel jupyter_client
# Install jupyter_client 5.2.3 from anaconda channel for bug workaround
conda install -c anaconda jupyter_client
# Install the kernel specification
R -e "IRkernel::installspec(name = '$CONDA_DEFAULT_ENV', displayname = 'R ($CONDA_DEFAULT_ENV)', user = TRUE)"
{{< /highlight >}}
4. Once you have the environment set up, deactivate it:
{{< highlight bash >}}conda deactivate{{< /highlight >}}
5. To make your conda environments accessible from the worker nodes,
enter the following commands:
{{< highlight bash >}}
mkdir -p $WORK/.jupyter
mv ~/.local/share/jupyter/kernels $WORK/.jupyter
ln -s $WORK/.jupyter/kernels ~/.local/share/jupyter/kernels
{{< /highlight >}}
{{% notice note %}}
**Note**: Step 5 only needs to be done once. Any future created
environments will automatically be accessible from SLURM notebooks
once this is done.
**Note**: For older version of anaconda, use `source deactivate` to
deactivate the environment.
{{% /notice %}}
6. Login to JupyterHub at https://crane.unl.edu
and create a new notebook using the environment by selecting the
correct entry in the `New` dropdown menu in the top right
corner.
{{< figure src="/images/24151931.png" height="400" class="img-border">}}