Skip to content
Snippets Groups Projects
Commit 0479212b authored by Adam Caprez's avatar Adam Caprez
Browse files

Restore deleted content.

parent 84c5dcd1
Branches
No related tags found
No related merge requests found
Showing
with 1716 additions and 0 deletions
+++
title = "FAQ"
description = "HCC Frequently Asked Questions"
weight = "10"
+++
- [I have an account, now what?](#i-have-an-account-now-what)
- [How do I change my password?](#how-do-i-change-my-password)
- [I forgot my password, how can I retrieve it?](#i-forgot-my-password-how-can-i-retrieve-it)
- [I just deleted some files and didn't mean to! Can I get them back?](#i-just-deleted-some-files-and-didn-t-mean-to-can-i-get-them-back)
- [How do I (re)activate Duo?](#how-do-i-re-activate-duo)
- [How many nodes/memory/time should I request?](#how-many-nodes-memory-time-should-i-request)
- [I am trying to run a job but nothing happens?](#i-am-trying-to-run-a-job-but-nothing-happens)
- [I keep getting the error "slurmstepd: error: Exceeded step memory limit at some point." What does this mean and how do I fix it?](#i-keep-getting-the-error-slurmstepd-error-exceeded-step-memory-limit-at-some-point-what-does-this-mean-and-how-do-i-fix-it)
- [I want to talk to a human about my problem. Can I do that?](#i-want-to-talk-to-a-human-about-my-problem-can-i-do-that)
---
#### I have an account, now what?
Congrats on getting an HCC account! Now you need to connect to a Holland
cluster. To do this, we use an SSH connection. SSH stands for Secure
Shell, and it allows you to securely connect to a remote computer and
operate it just like you would a personal machine.
Depending on your operating system, you may need to install software to
make this connection. Check out on Quick Start Guides for information on
how to install the necessary software for your operating system
- [For Mac/Linux Users]({{< relref "for_maclinux_users" >}})
- [For Windows Users]({{< relref "for_windows_users" >}})
#### How do I change my password?
#### I forgot my password, how can I retrieve it?
Information on how to change or retrieve your password can be found on
the documentation page: [How to change your
password]({{< relref "/accounts/how_to_change_your_password" >}})
All passwords must be at least 8 characters in length and must contain
at least one capital letter and one numeric digit. Passwords also cannot
contain any dictionary words. If you need help picking a good password,
consider using a (secure!) password generator such as
[this one provided by Random.org](https://www.random.org/passwords)
To preserve the security of your account, we recommend changing the
default password you were given as soon as possible.
#### I just deleted some files and didn't mean to! Can I get them back?
That depends. Where were the files you deleted?
**If the files were in your $HOME directory (/home/group/user/):** It's
possible.
$HOME directories are backed up daily and we can restore your files as
they were at the time of our last backup. Please note that any changes
made to the files between when the backup was made and when you deleted
them will not be preserved. To have these files restored, please contact
HCC Support at
{{< icon name="envelope" >}}[hcc-support@unl.edu] (mailto:hcc-support@unl.edu)
as soon as possible.
**If the files were in your $WORK directory (/work/group/user/):** No.
Unfortunately, the $WORK directories are created as a short term place
to hold job files. This storage was designed to be quickly and easily
accessed by our worker nodes and as such is not conducive to backups.
Any irreplaceable files should be backed up in a secondary location,
such as Attic, the cloud, or on your personal machine. For more
information on how to prevent file loss, check out [Preventing File
Loss]({{< relref "preventing_file_loss" >}}).
#### How do I (re)activate Duo?
**If you have not activated Duo before:**
Please stop by
[our offices](http://hcc.unl.edu/location)
along with a photo ID and we will be happy to activate it for you. If
you are not local to Omaha or Lincoln, contact us at
{{< icon name="envelope" >}}[hcc-support@unl.edu] (mailto:hcc-support@unl.edu)
and we will help you activate Duo remotely.
**If you have activated Duo previously but now have a different phone
number:**
Stop by our offices along with a photo ID and we can help you reactivate
Duo and update your account with your new phone number.
**If you have activated Duo previously and have the same phone number:**
Email us at
{{< icon name="envelope" >}}[hcc-support@unl.edu] (mailto:hcc-support@unl.edu)
from the email address your account is registered under and we will send
you a new link that you can use to activate Duo.
#### How many nodes/memory/time should I request?
**Short answer:** We don’t know.
**Long answer:** The amount of resources required is highly dependent on
the application you are using, the input file sizes and the parameters
you select. Sometimes it can help to speak with someone else who has
used the software before to see if they can give you an idea of what has
worked for them.
But ultimately, it comes down to trial and error; try different
combinations and see what works and what doesn’t. Good practice is to
check the output and utilization of each job you run. This will help you
determine what parameters you will need in the future.
For more information on how to determine how many resources a completed
job used, check out the documentation on [Monitoring Jobs]({{< relref "monitoring_jobs" >}}).
#### I am trying to run a job but nothing happens?
Where are you trying to run the job from? You can check this by typing
the command \`pwd\` into the terminal.
**If you are running from inside your $HOME directory
(/home/group/user/)**:
Move your files to your $WORK directory (/work/group/user) and resubmit
your job.
The worker nodes on our clusters have read-only access to the files in
$HOME directories. This means that when a job is submitted from $HOME,
the scheduler cannot write the output and error files in the directory
and the job is killed. It appears the job does nothing because no output
is produced.
**If you are running from inside your $WORK directory:**
Contact us at
{{< icon name="envelope" >}}[hcc-support@unl.edu] (mailto:hcc-support@unl.edu)
with your login, the name of the cluster you are running on, and the
full path to your submit script and we will be happy to help solve the
issue.
##### I keep getting the error "slurmstepd: error: Exceeded step memory limit at some point." What does this mean and how do I fix it?
This error occurs when the job you are running uses more memory than was
requested in your submit script.
If you specified `--mem` or `--mem-per-cpu` in your submit script, try
increasing this value and resubmitting your job.
If you did not specify `--mem` or `--mem-per-cpu` in your submit script,
chances are the default amount allotted is not sufficient. Add the line
{{< highlight batch >}}
#SBATCH --mem=<memory_amount>
{{< /highlight >}}
to your script with a reasonable amount of memory and try running it again. If you keep
getting this error, continue to increase the requested memory amount and
resubmit the job until it finishes successfully.
For additional details on how to monitor usage on jobs, check out the
documentation on [Monitoring Jobs]({{< relref "monitoring_jobs" >}}).
If you continue to run into issues, please contact us at
{{< icon name="envelope" >}}[hcc-support@unl.edu] (mailto:hcc-support@unl.edu)
for additional assistance.
#### I want to talk to a human about my problem. Can I do that?
Of course! We have an open door policy and invite you to stop by
[either of our offices](http://hcc.unl.edu/location)
anytime Monday through Friday between 9 am and 5 pm. One of the HCC
staff would be happy to help you with whatever problem or question you
have. Alternatively, you can drop one of us a line and we'll arrange a
time to meet: [Contact Us](https://hcc.unl.edu/contact-us).
+++
title = "Jupyter Notebooks on Crane"
description = "How to access and use a Jupyter Notebook"
weight = 20
+++
- [Connecting to Crane] (#connecting-to-crane)
- [Running Code] (#running-code)
- [Opening a Terminal] (#opening-a-terminal)
- [Using Custom Packages] (#using-custom-packages)
## Connecting to Crane
-----------------------
Jupyter defines it's notebooks ("Jupyter Notebooks") as
an open-source web application that allows you to create and share documents that contain live code,
equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation,
statistical modeling, data visualization, machine learning, and much more.
1. To open a Jupyter notebook, [Sign in](https://crane.unl.edu) to crane.unl.edu using your hcc credentials (NOT your
campus credentials).
{{< figure src="/images/jupyterLogin.png" >}}
2. Select your preferred authentication method.
{{< figure src="/images/jupyterPush.png" >}}
3. Choose a job profile. Select "Noteboook via SLURM Job | Small (1 core, 4GB RAM, 8 hours)" for light tasks such as debugging or small-scale testing.
Select the other options based on your computing needs. Note that a SLURM Job will save to your "work" directory.
{{< figure src="/images/jupyterjob.png" >}}
## Running Code
1. Select the "New" dropdown menu and select the file type you want to create.
{{< figure src="/images/jupyterNew.png" >}}
2. A new tab will open, where you can enter your code. Run your code by selecting the "play" icon.
{{< figure src="/images/jupyterCode.png">}}
## Opening a Terminal
1. From your user home page, select "terminal" from the "New" drop-down menu.
{{< figure src="/images/jupyterTerminal.png">}}
2. A terminal opens in a new tab. You can enter [Linux commands] ({{< relref "basic_linux_commands" >}})
at the prompt.
{{< figure src="/images/jupyterTerminal2.png">}}
## Using Custom Packages
Many popular `python` and `R` packages are already installed and available within Jupyter Notebooks.
However, it is possible to install custom packages to be used in notebooks by creating a custom Anaconda
Environment. Detailed information on how to create such an environment can be found at
[Using an Anaconda Environment in a Jupyter Notebook on Crane]({{< relref "/applications/user_software/using_anaconda_package_manager#using-an-anaconda-environment-in-a-jupyter-notebook-on-crane" >}}).
---
+++
title = "BLAST with Allinea Performance Reports"
description = "Example of how to profile BLAST using Allinea Performance Reports."
+++
Simple example of using
[BLAST]({{< relref "/applications/app_specific/bioinformatics_tools/alignment_tools/blast/running_blast_alignment" >}})
with Allinea Performance Reports (`perf-report`) on Crane is shown below:
{{% panel theme="info" header="blastn_perf_report.submit" %}}
{{< highlight batch >}}
#!/bin/sh
#SBATCH --job-name=BlastN
#SBATCH --nodes=1
#SBATCH --ntasks=16
#SBATCH --time=20:00:00
#SBATCH --mem=50gb
#SBATCH --output=BlastN.info
#SBATCH --error=BlastN.error
module load allinea
module load blast/2.2.29
cd $WORK/<project_folder>
cp -r /work/HCC/DATA/blastdb/nt/ /tmp/
cp input_reads.fasta /tmp/
perf-report --openmp-threads=$SLURM_NTASKS_PER_NODE --nompi `which blastn` \
-query /tmp/input_reads.fasta -db /tmp/nt/nt -out \
blastn_output.alignments -num_threads $SLURM_NTASKS_PER_NODE
cp blastn\_output.alignments .
{{< /highlight >}}
{{% /panel %}}
BLAST uses OpenMP and therefore the Allinea Performance Reports options
`--openmp-threads` and `--nompi` are used. The perf-report
part, `perf-report --openmp-threads=$SLURM_NTASKS_PER_NODE --nompi`,
is placed in front of the actual `blastn` command we want
to analyze.
{{% notice info %}}
If you see the error "**Allinea Performance Reports - target file
'application' does not exist on this machine... exiting**", this means
that instead of just using the executable '*application*', the full path
to that application is required. This is the reason why in the script
above, instead of using "*blastn*", we use *\`which blastn\`* which
gives the full path of the *blastn* executable.
{{% /notice %}}
When the application finishes, the performance report is generated in
the working directory.
For the executed application, this is how the report looks like:
{{< figure src="/images/11635296.png" width="850" >}}
From the report, we can see that **blastn** is Compute-Bound
application. The difference between mean (11.1 GB) and peak (26.3 GB)
memory is significant, and this may be sign of workload imbalance or a
memory leak. Moreover, 89.6% of the time is spent in synchronizing
threads in parallel regions which can lead to workload imbalance.
Running Allinea Performance Reports and identifying application
bottlenecks is really useful for improving the application and better
utilization of the available resources.
+++
title = "Ray with Allinea Performance Reports"
description = "Example of how to profile Ray using Allinea Performance Reports"
+++
Simple example of using [Ray]({{< relref "/applications/app_specific/bioinformatics_tools/de_novo_assembly_tools/ray" >}})
with Allinea PerformanceReports (`perf-report`) on Tusker is shown below:
{{% panel theme="info" header="ray_perf_report.submit" %}}
{{< highlight batch >}}
#!/bin/sh
#SBATCH --job-name=Ray
#SBATCH --ntasks-per-node=16
#SBATCH --time=10:00:00
#SBATCH --mem=70gb
#SBATCH --output=Ray.info
#SBATCH --error=Ray.error
module load allinea
module load compiler/gcc/4.7 openmpi/1.6 ray/2.3
perf-report mpiexec -n 16 Ray -k 31 -p -p input_reads_pair_1.fasta input_reads\_pair_2.fasta -o output_directory
{{< /highlight >}}
{{% /panel %}}
Ray is MPI and therefore additional Allinea Performance Reports options
are not required. The `perf-report` command is placed in front of the
actual `Ray` command we want to analyze.
When the application finishes, the performance report is generated in
the working directory.
For the executed application, this is how the report looks like:
{{< figure src="/images/11635303.png" width="850" >}}
From the report, we can see that **Ray **is Compute-Bound application.
Most of the running time is spent in point-to-point calls with a low
transfer rate which may be caused by inefficient message sizes.
Therefore, running this application with fewer MPI processes and more
data on each process may be more efficient.
Running Allinea Performance Reports and identifying application
bottlenecks is really useful for improving the application and better
utilization of the available resources.
+++
title = " Running BLAST Alignment"
description = "How to run BLAST alignment on HCC resources"
weight = "10"
+++
Basic BLAST has the following commands:
- **blastn**: search nucleotide database using a nucleotide query
- **blastp**: search protein database using a protein query
- **blastx**: search protein database using a translated nucleotide query
- **tblastn**: search translated nucleotide database using a protein query
- **tblastx**: search translated nucleotide database using a translated nucleotide query
The basic usage of **blastn** is:
{{< highlight bash >}}
$ blastn -query input_reads.fasta -db input_reads_db -out blastn_output.alignments [options]
{{< /highlight >}}
where **input_reads.fasta** is an input file of sequence data in fasta format, **input_reads_db** is the generated BLAST database, and **blastn_output.alignments** is the output file where the alignments are stored.
Additional parameters can be found in the [BLAST manual] (https://www.ncbi.nlm.nih.gov/books/NBK279690/), or by typing:
{{< highlight bash >}}
$ blastn -help
{{< /highlight >}}
These BLAST alignment commands are multi-threaded, and therefore using the BLAST option **-num_threads <number_of_CPUs>** is recommended.
HCC hosts multiple BLAST databases and indices on Crane. In order to use these resources, the ["biodata" module] ({{<relref "/applications/app_specific/bioinformatics_tools/biodata_module">}}) needs to be loaded first. The **$BLAST** variable contains the following currently available databases:
- **16SMicrobial**
- **env_nt**
- **est**
- **est_human**
- **est_mouse**
- **est_others**
- **gss**
- **human_genomic**
- **human_genomic_transcript**
- **mouse_genomic_transcript**
- **nr**
- **nt**
- **other_genomic**
- **refseq_genomic**
- **refseq_rna**
- **sts**
- **swissprot**
- **tsa_nr**
- **tsa_nt**
If you want to create and use a BLAST database that is not mentioned above, check [Create Local BLAST Database]({{<relref "create_local_blast_database" >}}).
Basic SLURM example of nucleotide BLAST run against the non-redundant **nt** BLAST database with `8 CPUs` is provided below. When running BLAST alignment, it is recommended to first copy the query and database files to the **/scratch/** directory of the worker node. Moreover, the BLAST output is also saved in this directory (**/scratch/blastn_output.alignments**). After BLAST finishes, the output file is copied from the worker node to your current work directory.
{{% notice info %}}
**Please note that the worker nodes can not write to the */home/* directories and therefore you need to run your job from your */work/* directory.**
**This example will first copy your database to faster local storage called “scratch”. This can greatly improve performance!**
{{% /notice %}}
{{% panel header="`blastn_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=BlastN
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=20gb
#SBATCH --output=BlastN.%J.out
#SBATCH --error=BlastN.%J.err
module load blast/2.7
module load biodata/1.0
cd $WORK/<project_folder>
cp $BLAST/nt.* /scratch/
cp input_reads.fasta /scratch/
blastn -query /scratch/input_reads.fasta -db /scratch/nt -out /scratch/blastn_output.alignments -num_threads $SLURM_NTASKS_PER_NODE
cp /scratch/blastn_output.alignments $WORK/<project_folder>
{{< /highlight >}}
{{% /panel %}}
One important BLAST parameter is the **e-value threshold** that changes the number of hits returned by showing only those with value lower than the given. To show the hits with **e-value** lower than 1e-10, modify the given script as follows:
{{< highlight bash >}}
$ blastn -query input_reads.fasta -db input_reads_db -out blastn_output.alignments -num_threads $SLURM_NTASKS_PER_NODE -evalue 1e-10
{{< /highlight >}}
The default BLAST output is in pairwise format. However, BLAST’s parameter **-outfmt** supports output in [different formats] (https://www.ncbi.nlm.nih.gov/books/NBK279684/) that are easier for parsing.
Basic SLURM example of protein BLAST run against the non-redundant **nr **BLAST database with tabular output format and `8 CPUs` is shown below. Similarly as before, the query and database files are copied to the **/scratch/** directory. The BLAST output is also saved in this directory (**/scratch/blastx_output.alignments**). After BLAST finishes, the output file is copied from the worker node to your current work directory.
{{% notice info %}}
**Please note that the worker nodes can not write to the */home/* directories and therefore you need to run your job from your */work/* directory.**
**This example will first copy your database to faster local storage called “scratch”. This can greatly improve performance!**
{{% /notice %}}
{{% panel header="`blastx_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=BlastX
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=20gb
#SBATCH --output=BlastX.%J.out
#SBATCH --error=BlastX.%J.err
module load blast/2.7
module load biodata/1.0
cd $WORK/<project_folder>
cp $BLAST/nr.* /scratch/
cp input_reads.fasta /scratch/
blastx -query /scratch/input_reads.fasta -db /scratch/nr -outfmt 6 -out /scratch/blastx_output.alignments -num_threads $SLURM_NTASKS_PER_NODE
cp /scratch/blastx_output.alignments $WORK/<project_folder>
{{< /highlight >}}
{{% /panel %}}
+++
title = "Biodata Module"
description = "How to use Biodata Module on HCC machines"
scripts = ["https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/jquery.tablesorter.min.js", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-pager.min.js","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-filter.min.js","/js/sort-table.js"]
css = ["http://mottie.github.io/tablesorter/css/theme.default.css","https://mottie.github.io/tablesorter/css/theme.dropbox.css", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/jquery.tablesorter.pager.min.css","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/filter.formatter.min.css"]
weight = "52"
+++
HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan), genome files, short read aligned indices etc. on Crane.
In order to use these resources, the "**biodata**" module needs to be loaded first.
For how to load module, please check [Module Commands]({{< relref "/applications/modules/_index.md" >}}).
Loading the "**biodata**" module will pre-set many environment variables, but most likely you will only need a subset of them. Environment variables can be used in your command or script by prefixing `$` to the name.
The major environment variables are:
**$DATA** - main directory
**$BLAST** - Directory containing all available BLAST (nucleotide and protein) databases
**$KEGG** - KEGG database main entry point (requires license)
**$PANTHER** - PANTHER database main entry point (latest)
**$IPR** - InterProScan database main entry point (latest)
**$GENOMES** - Directory containing all available genomes (multiple sources, builds possible
**$INDICES** - Directory containing indices for bowtie, bowtie2, bwa for all available genomes
**$UNIPROT** - Directory containing latest release of full UniProt database
In order to check what genomes are available, you can type:
{{< highlight bash >}}
$ ls $GENOMES
{{< /highlight >}}
In order to check what BLAST databases are available, you can just type:
{{< highlight bash >}}
$ ls $BLAST
{{< /highlight >}}
An example of how to run Bowtie2 local alignment on Crane utilizing the default Horse, *Equus caballus* index (*BOWTIE2\_HORSE*) with paired-end fasta files and 8 CPUs is shown below:
{{% panel header="`bowtie2_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=Bowtie2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Bowtie2.%J.out
#SBATCH --error=Bowtie2.%J.err
module load bowtie/2.2
module load biodata
bowtie2 -x $BOWTIE2_HORSE -f -1 input_reads_pair_1.fasta -2 input_reads_pair_2.fasta -S bowtie2_alignments.sam --local -p $SLURM_NTASKS_PER_NODE
{{< /highlight >}}
{{% /panel %}}
An example of BLAST run against the non-redundant nucleotide database available on Crane is provided below:
{{% panel header="`blastn_alignment.submit`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --job-name=BlastN
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=BlastN.%J.out
#SBATCH --error=BlastN.%J.err
module load blast/2.7
module load biodata
cp $BLAST/nt.* /scratch
cp input_reads.fasta /scratch
blastn -db /scratch/nt -query /scratch/input_reads.fasta -out /scratch/blast_nucleotide.results
cp /scratch/blast_nucleotide.results .
{{< /highlight >}}
{{% /panel %}}
### Available Organisms
The organisms and their appropriate environmental variables for all genomes and chromosome files, as well as indices are shown in the table below.
{{< table url="http://rhino-head.unl.edu:8192/bio/data/json" >}}
+++
title = "DMTCP Checkpointing"
description = "How to use the DMTCP utility to checkpoint your application."
+++
[DMTCP](http://dmtcp.sourceforge.net)
(Distributed MultiThreaded Checkpointing) is a checkpointing package for
applications. Using checkpointing allows resuming of a failing
simulation due to failing resources (e.g. hardware, software, exceeded
time and memory resources).
DMTCP supports both sequential and multi-threaded applications. Some
examples of binary programs on Linux distributions that can be used with
DMTCP are OpenMP, MATLAB, Python, Perl, MySQL, bash, gdb, X-Windows etc.
DMTCP provides support for several resource managers, including SLURM,
the resource manager used in HCC. The DMTCP module is available both on
Crane, and is enabled by typing:
{{< highlight bash >}}
module load dmtcp
{{< /highlight >}}
After the module is loaded, the first step is to run the command:
{{< highlight bash >}}
[<username>@login.crane ~]$ dmtcp_launch --new-coordinator --rm --interval <interval_time_seconds> <your_command>
{{< /highlight >}}
where `--rm` option enables SLURM support,
**\<interval_time_seconds\>** is the time in seconds between
automatic checkpoints, and **\<your_command\>** is the actual
command you want to run and checkpoint.
Beside the general options shown above, more `dmtcp_launch` options
can be seen by using:
{{< highlight bash >}}
[<username>@login.crane ~]$ dmtcp_launch --help
{{< /highlight >}}
`dmtcp_launch` creates few files that are used to resume the
cancelled job, such as *ckpt\_\*.dmtcp* and
*dmtcp\_restart\_script\*.sh*. Unless otherwise stated
(using `--ckptdir` option), these files are stored in the current
working directory.
The second step of DMTCP is to restart the cancelled job, and there are
two ways of doing that:
- `dmtcp_restart ckpt_*.dmtcp` *\<options\>* (before running
this command delete any old *ckp\_\*.dmtcp* files in your current
directory)
- `./dmtcp_restart_script.sh` *\<options\>*
If there are no options defined in the *&lt;options&gt;* field, DMTCP
will keep running with the options defined in the initial
**dmtcp\_launch** call (such as interval time, output directory etc).
Simple example of using DMTCP with
[BLAST]({{< relref "/applications/app_specific/bioinformatics_tools/alignment_tools/blast/running_blast_alignment" >}})
on crane is shown below:
{{% panel theme="info" header="dmtcp_blastx.submit" %}}
{{< highlight batch >}}
#!/bin/sh
#SBATCH --job-name=BlastX
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --time=50:00:00
#SBATCH --mem=20gb
#SBATCH --output=BlastX_info_1.txt
#SBATCH --error=BlastX_error_1.txt
module load dmtcp
module load blast/2.4
cd $WORK/<project_folder>
cp -r /work/HCC/DATA/blastdb/nr/ /tmp/
cp input_reads.fasta /tmp/
dmtcp_launch --new-coordinator --rm --interval 3600 blastx -query \
/tmp/input_reads.fasta -db /tmp/nr/nr -out blastx_output.alignments \
-num_threads $SLURM_NTASKS_PER_NODE
{{< /highlight >}}
{{% /panel %}}
In this example, DMTCP takes checkpoints every hour (`--interval 3600`),
and the actual command we want to checkpoint is `blastx` with
some general BLAST options defined with `-query`, `-db`, `-out`,
`-num_threads`.
If this job is killed for various reasons, it can be restarted using the
following submit file:
{{% panel theme="info" header="dmtcp_restart_blastx.submit" %}}
{{< highlight batch >}}
#!/bin/sh
#SBATCH --job-name=BlastX
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --time=50:00:00
#SBATCH --mem=20gb
#SBATCH --output=BlastX_info_2.txt
#SBATCH --error=BlastX_error_2.txt
module load dmtcp
module load blast/2.4
cd $WORK/<project_folder>
cp -r /work/HCC/DATA/blastdb/nr/ /tmp/
cp input_reads.fasta /tmp/
# Start DMTCP
dmtcp_coordinator --daemon --port 0 --port-file /tmp/port
export DMTCP_COORD_HOST=`hostname`
export DMTCP_COORD_PORT=$(</tmp/port)
# Restart job
./dmtcp_restart_script.sh
{{< /highlight >}}
{{% /panel %}}
{{% notice info %}}
`dmtcp_restart` generates new
`ckpt_*.dmtcp` and `dmtcp_restart_script*.sh` files. Therefore, if
the restarted job is also killed due to unavailable/exceeded resources,
you can resubmit the same job again without any changes in the submit
file shown above (just don't forget to delete the old `ckpt_*.dmtcp`
files if you are using these files instead of `dmtcp_restart_script.sh`)
{{% /notice %}}
Even though DMTCP tries to support most mainstream and commonly used
applications, there is no guarantee that every application can be
checkpointed and restarted.
+++
title = "Fortran/C on HCC"
description = "How to compile and run Fortran/C program on HCC machines"
weight = "50"
+++
This quick start demonstrates how to implement a Fortran/C program on
HCC supercomputers. The sample codes and submit scripts can be
downloaded from [serial_dir.zip](/attachments/serial_dir.zip).
#### Login to a HCC Cluster
Log in to a HCC cluster through PuTTY ([For Windows Users]({{< relref "/connecting/for_windows_users">}})) or Terminal ([For Mac/Linux
Users]({{< relref "/connecting/for_maclinux_users">}})) and make a subdirectory called `serial_dir` under the `$WORK` directory.
{{< highlight bash >}}
$ cd $WORK
$ mkdir serial_dir
{{< /highlight >}}
In the subdirectory `serial_dir`, save all the relevant Fortran/C codes. Here we include two demo
programs, `demo_f_serial.f90` and `demo_c_serial.c`, that compute the sum from 1 to 20.
{{%expand "demo_f_serial.f90" %}}
{{< highlight bash >}}
Program demo_f_serial
implicit none
integer, parameter :: N = 20
real*8 w
integer i
common/sol/ x
real*8 x
real*8, dimension(N) :: y
do i = 1,N
w = i*1d0
call proc(w)
y(i) = x
write(6,*) 'i,x = ', i, y(i)
enddo
write(6,*) 'sum(y) =',sum(y)
Stop
End Program
Subroutine proc(w)
real*8, intent(in) :: w
common/sol/ x
real*8 x
x = w
Return
End Subroutine
{{< /highlight >}}
{{% /expand %}}
{{%expand "demo_c_serial.c" %}}
{{< highlight c >}}
//demo_c_serial
#include <stdio.h>
double proc(double w){
double x;
x = w;
return x;
}
int main(int argc, char* argv[]){
int N=20;
double w;
int i;
double x;
double y[N];
double sum;
for (i = 1; i <= N; i++){
w = i*1e0;
x = proc(w);
y[i-1] = x;
printf("i,x= %d %lf\n", i, y[i-1]) ;
}
sum = 0e0;
for (i = 1; i<= N; i++){
sum = sum + y[i-1];
}
printf("sum(y)= %lf\n", sum);
return 0;
}
{{< /highlight >}}
{{% /expand %}}
---
#### Compiling the Code
The compiling of a Fortran/C++ code to executable is usually done behind
the scene in a Graphical User Interface (GUI) environment, such as
Microsoft Visual Studio. In a HCC cluster, the compiling is done
explicitly by first loading a choice compiler and then executing the
corresponding compiling command. Here we will use the GNU Complier
Collection, `gcc`, for demonstration. Other available compilers such as
`intel` or `pgi` can be looked up using the command
line `module avail`. Before compiling the code, make sure there is no
dependency on any numerical library in the code. If invoking a numerical
library is necessary, contact a HCC specialist
({{< icon name="envelope" >}}[hcc-support@unl.edu] (mailto:hcc-support@unl.edu)) to
discuss implementation options.
{{< highlight bash >}}
$ module load compiler/gcc/8.2
$ gfortran demo_f_serial.f90 -o demo_f_serial.x
$ gcc demo_c_serial.c -o demo_c_serial.x
{{< /highlight >}}
The above commends load the `gcc` complier and use the compiling
commands `gfortran` or `gcc` to compile the codes to`.x` files
(executables).
#### Creating a Submit Script
Create a submit script to request one core (default) and 1-min run time
on the supercomputer. The name of the main program enters at the last
line.
{{% panel header="`submit_f.serial`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --mem-per-cpu=1024
#SBATCH --time=00:01:00
#SBATCH --job-name=Fortran
#SBATCH --error=Fortran.%J.err
#SBATCH --output=Fortran.%J.out
module load compiler/gcc/4.9
./demo_f_serial.x
{{< /highlight >}}
{{% /panel %}}
{{% panel header="`submit_c.serial`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --mem-per-cpu=1024
#SBATCH --time=00:01:00
#SBATCH --job-name=C
#SBATCH --error=C.%J.err
#SBATCH --output=C.%J.out
module load compiler/gcc/4.9
./demo_c_serial.x
{{< /highlight >}}
{{% /panel %}}
#### Submit the Job
The job can be submitted through the command `sbatch`. The job status
can be monitored by entering `squeue` with the `-u` option.
{{< highlight bash >}}
$ sbatch submit_f.serial
$ sbatch submit_c.serial
$ squeue -u <username>
{{< /highlight >}}
Replace `<username>` with your HCC username.
#### Sample Output
The sum from 1 to 20 is computed and printed to the `.out` file (see
below).
{{%expand "Fortran.out" %}}
{{< highlight batchfile>}}
i,x = 1 1.0000000000000000
i,x = 2 2.0000000000000000
i,x = 3 3.0000000000000000
i,x = 4 4.0000000000000000
i,x = 5 5.0000000000000000
i,x = 6 6.0000000000000000
i,x = 7 7.0000000000000000
i,x = 8 8.0000000000000000
i,x = 9 9.0000000000000000
i,x = 10 10.000000000000000
i,x = 11 11.000000000000000
i,x = 12 12.000000000000000
i,x = 13 13.000000000000000
i,x = 14 14.000000000000000
i,x = 15 15.000000000000000
i,x = 16 16.000000000000000
i,x = 17 17.000000000000000
i,x = 18 18.000000000000000
i,x = 19 19.000000000000000
i,x = 20 20.000000000000000
sum(y) = 210.00000000000000
{{< /highlight >}}
{{% /expand %}}
{{%expand "C.out" %}}
{{< highlight batchfile>}}
i,x= 1 1.000000
i,x= 2 2.000000
i,x= 3 3.000000
i,x= 4 4.000000
i,x= 5 5.000000
i,x= 6 6.000000
i,x= 7 7.000000
i,x= 8 8.000000
i,x= 9 9.000000
i,x= 10 10.000000
i,x= 11 11.000000
i,x= 12 12.000000
i,x= 13 13.000000
i,x= 14 14.000000
i,x= 15 15.000000
i,x= 16 16.000000
i,x= 17 17.000000
i,x= 18 18.000000
i,x= 19 19.000000
i,x= 20 20.000000
sum(y)= 210.000000
{{< /highlight >}}
{{% /expand %}}
+++
title = "MPI Jobs on HCC"
description = "How to compile and run MPI programs on HCC machines"
weight = "52"
+++
This quick start demonstrates how to implement a parallel (MPI)
Fortran/C program on HCC supercomputers. The sample codes and submit
scripts can be downloaded from [mpi_dir.zip](/attachments/mpi_dir.zip).
#### Login to a HCC Cluster
Log in to a HCC cluster through PuTTY ([For Windows Users]({{< relref "/connecting/for_windows_users">}})) or Terminal ([For Mac/Linux
Users]({{< relref "/connecting/for_maclinux_users">}})) and make a subdirectory called `mpi_dir` under the `$WORK` directory.
{{< highlight bash >}}
$ cd $WORK
$ mkdir mpi_dir
{{< /highlight >}}
In the subdirectory `mpi_dir`, save all the relevant codes. Here we
include two demo programs, `demo_f_mpi.f90` and `demo_c_mpi.c`, that
compute the sum from 1 to 20 through parallel processes. A
straightforward parallelization scheme is used for demonstration
purpose. First, the master core (i.e. `myid=0`) distributes equal
computation workload to a certain number of cores (as specified by
`--ntasks `in the submit script). Then, each worker core computes a
partial summation as output. Finally, the master core collects the
outputs from all worker cores and perform an overall summation. For easy
comparison with the serial code ([Fortran/C on HCC]({{< relref "fortran_c_on_hcc">}})), the
added lines in the parallel code (MPI) are marked with "!=" or "//=".
{{%expand "demo_f_mpi.f90" %}}
{{< highlight fortran >}}
Program demo_f_mpi
!====== MPI =====
use mpi
!================
implicit none
integer, parameter :: N = 20
real*8 w
integer i
common/sol/ x
real*8 x
real*8, dimension(N) :: y
!============================== MPI =================================
integer ind
real*8, dimension(:), allocatable :: y_local
integer numnodes,myid,rc,ierr,start_local,end_local,N_local
real*8 allsum
!====================================================================
!============================== MPI =================================
call mpi_init( ierr )
call mpi_comm_rank ( mpi_comm_world, myid, ierr )
call mpi_comm_size ( mpi_comm_world, numnodes, ierr )
!
N_local = N/numnodes
allocate ( y_local(N_local) )
start_local = N_local*myid + 1
end_local = N_local*myid + N_local
!====================================================================
do i = start_local, end_local
w = i*1d0
call proc(w)
ind = i - N_local*myid
y_local(ind) = x
! y(i) = x
! write(6,*) 'i, y(i)', i, y(i)
enddo
! write(6,*) 'sum(y) =',sum(y)
!============================================== MPI =====================================================
call mpi_reduce( sum(y_local), allsum, 1, mpi_real8, mpi_sum, 0, mpi_comm_world, ierr )
call mpi_gather ( y_local, N_local, mpi_real8, y, N_local, mpi_real8, 0, mpi_comm_world, ierr )
if (myid == 0) then
write(6,*) '-----------------------------------------'
write(6,*) '*Final output from... myid=', myid
write(6,*) 'numnodes =', numnodes
write(6,*) 'mpi_sum =', allsum
write(6,*) 'y=...'
do i = 1, N
write(6,*) y(i)
enddo
write(6,*) 'sum(y)=', sum(y)
endif
deallocate( y_local )
call mpi_finalize(rc)
!========================================================================================================
Stop
End Program
Subroutine proc(w)
real*8, intent(in) :: w
common/sol/ x
real*8 x
x = w
Return
End Subroutine
{{< /highlight >}}
{{% /expand %}}
{{%expand "demo_c_mpi.c" %}}
{{< highlight c >}}
//demo_c_mpi
#include <stdio.h>
//======= MPI ========
#include "mpi.h"
#include <stdlib.h>
//====================
double proc(double w){
double x;
x = w;
return x;
}
int main(int argc, char* argv[]){
int N=20;
double w;
int i;
double x;
double y[N];
double sum;
//=============================== MPI ============================
int ind;
double *y_local;
int numnodes,myid,rc,ierr,start_local,end_local,N_local;
double allsum;
//================================================================
//=============================== MPI ============================
MPI_Init(&argc, &argv);
MPI_Comm_rank( MPI_COMM_WORLD, &myid );
MPI_Comm_size ( MPI_COMM_WORLD, &numnodes );
N_local = N/numnodes;
y_local=(double *) malloc(N_local*sizeof(double));
start_local = N_local*myid + 1;
end_local = N_local*myid + N_local;
//================================================================
for (i = start_local; i <= end_local; i++){
w = i*1e0;
x = proc(w);
ind = i - N_local*myid;
y_local[ind-1] = x;
// y[i-1] = x;
// printf("i,x= %d %lf\n", i, y[i-1]) ;
}
sum = 0e0;
for (i = 1; i<= N_local; i++){
sum = sum + y_local[i-1];
}
// printf("sum(y)= %lf\n", sum);
//====================================== MPI ===========================================
MPI_Reduce( &sum, &allsum, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD );
MPI_Gather( &y_local[0], N_local, MPI_DOUBLE, &y[0], N_local, MPI_DOUBLE, 0, MPI_COMM_WORLD );
if (myid == 0){
printf("-----------------------------------\n");
printf("*Final output from... myid= %d\n", myid);
printf("numnodes = %d\n", numnodes);
printf("mpi_sum = %lf\n", allsum);
printf("y=...\n");
for (i = 1; i <= N; i++){
printf("%lf\n", y[i-1]);
}
sum = 0e0;
for (i = 1; i<= N; i++){
sum = sum + y[i-1];
}
printf("sum(y) = %lf\n", sum);
}
free( y_local );
MPI_Finalize ();
//======================================================================================
return 0;
}
{{< /highlight >}}
{{% /expand %}}
---
#### Compiling the Code
The compiling of a MPI code requires first loading a compiler "engine"
such as `gcc`, `intel`, or `pgi` and then loading a MPI wrapper
`openmpi`. Here we will use the GNU Complier Collection, `gcc`, for
demonstration.
{{< highlight bash >}}
$ module load compiler/gcc/6.1 openmpi/2.1
$ mpif90 demo_f_mpi.f90 -o demo_f_mpi.x
$ mpicc demo_c_mpi.c -o demo_c_mpi.x
{{< /highlight >}}
The above commends load the `gcc` complier with the `openmpi` wrapper.
The compiling commands `mpif90` or `mpicc` are used to compile the codes
to`.x` files (executables).
### Creating a Submit Script
Create a submit script to request 5 cores (with `--ntasks`). A parallel
execution command `mpirun ./` needs to enter to last line before the
main program name.
{{% panel header="`submit_f.mpi`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --ntasks=5
#SBATCH --mem-per-cpu=1024
#SBATCH --time=00:01:00
#SBATCH --job-name=Fortran
#SBATCH --error=Fortran.%J.err
#SBATCH --output=Fortran.%J.out
mpirun ./demo_f_mpi.x
{{< /highlight >}}
{{% /panel %}}
{{% panel header="`submit_c.mpi`"%}}
{{< highlight bash >}}
#!/bin/sh
#SBATCH --ntasks=5
#SBATCH --mem-per-cpu=1024
#SBATCH --time=00:01:00
#SBATCH --job-name=C
#SBATCH --error=C.%J.err
#SBATCH --output=C.%J.out
mpirun ./demo_c_mpi.x
{{< /highlight >}}
{{% /panel %}}
#### Submit the Job
The job can be submitted through the command `sbatch`. The job status
can be monitored by entering `squeue` with the `-u` option.
{{< highlight bash >}}
$ sbatch submit_f.mpi
$ sbatch submit_c.mpi
$ squeue -u <username>
{{< /highlight >}}
Replace `<username>` with your HCC username.
Sample Output
-------------
The sum from 1 to 20 is computed and printed to the `.out` file (see
below). The outputs from the 5 cores are collected and processed by the
master core (i.e. `myid=0`).
{{%expand "Fortran.out" %}}
{{< highlight batchfile>}}
-----------------------------------------
*Final output from... myid= 0
numnodes = 5
mpi_sum = 210.00000000000000
y=...
1.0000000000000000
2.0000000000000000
3.0000000000000000
4.0000000000000000
5.0000000000000000
6.0000000000000000
7.0000000000000000
8.0000000000000000
9.0000000000000000
10.000000000000000
11.000000000000000
12.000000000000000
13.000000000000000
14.000000000000000
15.000000000000000
16.000000000000000
17.000000000000000
18.000000000000000
19.000000000000000
20.000000000000000
sum(y)= 210.00000000000000
{{< /highlight >}}
{{% /expand %}}
{{%expand "C.out" %}}
{{< highlight batchfile>}}
-----------------------------------
*Final output from... myid= 0
numnodes = 5
mpi_sum = 210.000000
y=...
1.000000
2.000000
3.000000
4.000000
5.000000
6.000000
7.000000
8.000000
9.000000
10.000000
11.000000
12.000000
13.000000
14.000000
15.000000
16.000000
17.000000
18.000000
19.000000
20.000000
sum(y) = 210.000000
{{< /highlight >}}
{{% /expand %}}
+++
title = "Linux File Permissions"
description = "How to view and change file permissions with Linux commands"
weight = 20
+++
- [Opening a Terminal Window] (#opening-a-terminal-window)
- [Listing File Permissions] (#listing-file-permissions)
- [Changing File Permissions] (#changing-file-permissions)
## Opening a Terminal Window
-----------------------
Use your local terminal to connect to a cluster, or open a new terminal window on [Crane](https://crane.unl.edu).
Click [here](https://hcc.unl.edu/docs/Quickstarts/connecting/) if you need help connecting to a cluster
with a local terminal.
Click [here](https://hcc.unl.edu/docs/guides/running_applications/jupyter/) if you need
help opening a new terminal window within JupyterHub.
## Listing File Permissions
Type the command `ls -l` to list the files and directories with file permissions for your current location.
{{< figure src="/images/LinuxList.png" >}}
The first character denotes whether an item is a file or a directory. If 'd' is shown, it's a directory, and if '-' is shown, it's a file.
Following the first character you will see some
combination of r,w,x, and -. The first rwx is the ‘read’ ‘write’ ‘execute’ file permissions for the creator
of that file or directory. A ‘-‘ instead means a particular permission has not been granted. For example “rw-“ means the
‘execute’ permission has not been granted. The next three entries are the permissions for ‘group’ and the last three are the
permissions for everyone else.
Following the file permissions are the name of the creator, the name of the group, the size of the file, the date it was created, and finally
the name of the file.
## Changing File Permissions
To change file permissions, use the command "chmod [permissions] [filename]" where permissions are indicated by a three-digit code.
Each digit in the code correspondes to the three digits mentioned above in the permissions printout: One for the creater permissions,
one for the group permissions, and one for everyone else. The command is interpreted as follows: 4=read 2=write 1=execute and any combination of these is given by summing their codes.
Each chmod command will include 3 codes.
For example, to give the creator of mars.txt rights to read, write and execute, the group rights to read and execute, and everone else only the right to read,
we would use the command `chmod 754 mars.txt`
{{< figure src="/images/LinuxChange.png" >}}
+++
title = "Preventing File Loss"
description = "How to prevent file loss on HCC clusters"
weight = 40
+++
Each research group is allocated 50TB of storage in `/work` on HCC
clusters. With over 400 active groups, HCC does not have the resources
to provide regular backups of `/work` without sacrificing the
performance of the existing filesystem. No matter how careful a user
might be, there is always the risk of file loss due to user error,
natural disasters, or equipment failure.
However, there are a number of solutions available for backing up your
data. By carefully considering the benefits and limitations of each,
users can select the backup methods that work best for their particular
needs. For truly robust file backups, we recommend combining multiple
methods. For example, use Git regularly along with manual backups to an
external hard-drive at regular intervals such as monthly or biannually.
---
### 1. Use your local machine:
If you have sufficient hard drive space, regularly backup your `/work`
directories to your personal computer. To avoid filling up your personal
hard-drives, consider using an external drive that can easily be placed
in a fireproof safe or at an off-site location for an extra level of
protection. To do this, you can either use [Globus
Connect]({{< relref "/handling_data/data_transfer/globus_connect/_index.md" >}}) or an
SCP client, such
as <a href="https://cyberduck.io/" class="external-link">Cyberduck</a> or <a href="https://winscp.net/eng/index.php" class="external-link">WinSCP</a>.
For help setting up an SCP client, check out our [Connecting Guides]({{< relref "/connecting" >}}).
For those worried about personal hard drive crashes, UNL
offers <a href="http://nsave.unl.edu/" class="external-link">the backup service NSave</a>.
For a small monthly fee, users can install software that will
automatically backup selected files from their personal machine.
Benefits:
- Gives you full control over what is backed up and when.
- Doesn't require the use of third party servers (when using SCP
clients).
- Take advantage of our high speed data transfers (10 Gb/s) when using
Globus Connect or [setup your SCP client to use our dedicated high
speed transfer
servers]({{< relref "/handling_data/data_transfer/high_speed_data_transfers.md" >}})
Limitations:
- The amount you can backup is limited by available hard-drive space.
- Manual backups of many files can be time consuming.
---
### 2. Use Git to preserve files and revision history:
Git is a revision control service which can be run locally or can be
paired with a repository hosting service, such
as <a href="http://www.github.com/" class="external-link">GitHub</a>, to
provide a remote backup of your files. Git works best with smaller files
such as source code and manuscripts. Anyone with an InCommon login can
utilize <a href="http://git.unl.edu/" class="external-link">UNL's GitLab Instance</a>,
for free.
Benefits:
- Git is naturally collaboration-friendly, allowing multiple people to
easily work on the same project and provides great built-in tools to
control contributions and managing conflicting changes.
- Create individual repositories for each project, allowing you to
compartmentalize your work.
- Using UNL's GitLab instance allows you to create private or internal
(accessible by anyone within your organization) repositories.
Limitations:
- Git is not designed to handle large files. GitHub does not allow
files larger than 100MB unless using
their <a href="https://help.github.com/articles/about-git-large-file-storage/" class="external-link">Git Large File Storage</a> and
tracking files over 1GB in size can be time consuming and lead to
errors when using other repository hosts.
---
### 3. Use Attic:
HCC offers
long-term, <a href="https://en.wikipedia.org/wiki/Nearline_storage" class="external-link">near-line</a> data
storage
through [Attic]({{< relref "using_attic" >}}).
HCC users with an existing account
can <a href="http://hcc.unl.edu/attic" class="external-link">apply for an Attic account</a> for
a <a href="http://hcc.unl.edu/priority-access-pricing" class="external-link">small annual fee</a> that
is substantially less than other cloud services.
Benefits:
- Attic files are backed up regularly at both HCC locations in Omaha
and Lincoln to help provide disaster tolerance and a second security
layer against file loss.
- No limits on individual or total file sizes.
- High speed data transfers between Attic and the clusters when using
[Globus Connect]({{< relref "/handling_data/data_transfer/globus_connect/_index.md" >}}) and [HCC's high-speed data
servers]({{< relref "/handling_data/data_transfer/high_speed_data_transfers.md" >}}).
Limitations:
- Backups must be done manually which can be time consuming. Setting
up automated scripts can help speed up this process.
---
### 4. Use a cloud-based service, such as Box:
Many of us are familiar with services such as Google Drive, Dropbox, Box
and OneDrive. These cloud-based services provide a convenient portal for
accessing your files from any computer. NU offers OneDrive and Box
services to all students, staff and faculty. But did you know that you
can link your Box account to HCC’s clusters to provide quick and easy
access to files stored there? [Follow a few set-up
steps]({{< relref "integrating_box_with_hcc" >}}) and
you can add files to and access files stored in your Box account
directly from HCC clusters. Setup your submit scripts to automatically
upload results as they are generated or use it interactively to store
important workflow scripts and maintain a backup of your analysis
results.
Benefits:
- <a href="http://box.unl.edu/" class="external-link">Box@UNL</a> offers
unlimited file storage while you are associated with UNL.
- Integrating with HCC clusters provides a quick and easy way to
automate backups of analysis results and workflow scripts.
Limitations:
- Box has individual file size limitations, larger files will need to
be backed up using an alternate method.
---
### 5. Copy important files to `/home`:
While `/work` files and directories are not backed up, files and
directories in `/home` are backed up on a daily basis. Due to the
limitations of the `/home` filesystem, we strongly recommend that only
source code and compiled programs are backed up to `/home`. If you do
use `/home` to backup datasets, please keep a working copy in your
`/work` directories to prevent negatively impacting the functionality of
the cluster.
Benefits:
- No need to make manual backups. `\home` files are automatically backed
up daily.
- Files in `/home` are not subject to the 6 month purge policy that
exists on `/work`.
- Doesn't require the use of third-party software or tools.
Limitations:
- Home storage is limited to 20GB per user. Larger files sets will
need to be backed up using an alternate method.
- Home is read-only on the cluster worker nodes so results cannot be
directly written or altered from within a submitted job.
If you would like more information or assistance in setting up any of
these methods, contact us
at <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a>.
+++
title = "Using Attic"
description = "How to store data on Attic"
weight = 20
+++
For users who need long-term storage for large amount of data, HCC
provides an economical solution called Attic. Attic is a reliable
<a href="https://en.wikipedia.org/wiki/Nearline_storage" class="external-link">near-line data archive</a> storage
system. The files in Attic can be accessed and shared from anywhere
using [Globus
Connect]({{< relref "/handling_data/data_transfer/globus_connect" >}}),
with a fast 10Gb/s link. Also, the data in Attic is backed up between
our Lincoln and Omaha facilities to ensure high availability and
disaster tolerance. The data and user activities on Attic are subject to
our
<a href="http://hcc.unl.edu/hcc-policies" class="external-link">HCC Policies</a>.
---
### Accounts and Cost
To use Attic you will first need an
<a href="https://hcc.unl.edu/new-user-request" class="external-link">HCC account</a>, and
then you may request an
<a href="http://hcc.unl.edu/attic" class="external-link">Attic allocation</a>.
We charge a small fee per TB per year, but it is cheaper than most
commercial cloud storage solutions. For the user application form and
cost, please see the
<a href="http://hcc.unl.edu/attic" class="external-link">HCC Attic page</a>.
---
### Transfer Files Using Globus Connect
The easiest and fastest way to access Attic is via Globus. You can
transfer files between your computer, our clusters ($HOME, $WORK, and $COMMON on
Crane or Rhino), and Attic. Here is a detailed tutorial on
how to set up and use [Globus Connect]({{< relref "/handling_data/data_transfer/globus_connect" >}}). For
Attic, use the Globus Endpoint **hcc\#attic**. Your Attic files are
located at `~, `which is a shortcut
for `/attic/<groupname>/<username>`.
**Note:** *If you are accessing Attic files from your supplementary
group, you should explicitly set the path to
/attic/&lt;supplementary\_groupname&gt;/. If you don't do that, by
default the endpoint will try to place you in your primary group's Attic
path, to which access will be denied if the primary group doesn't have an Attic allocation.*
---
### Transfer Files Using SCP/SFTP/RSYNC
The transfer server for Attic storage is `attic.unl.edu` (or `attic-xfer.unl.edu`).
{{% panel theme="info" header="SCP Example" %}}
{{< highlight bash >}}
scp /source/file <username>@attic.unl.edu:~/destination/file
{{< /highlight >}}
{{% /panel %}}
{{% panel theme="info" header="SFTP Example" %}}
{{< highlight bash >}}
sftp <username>@attic.unl.edu
Password:
Duo two-factor login for <username>
Connected to attic.unl.edu.
sftp> pwd
Remote working directory: /attic/<groupname>/<username>
sftp> put source/file destination/file
sftp> exit
{{< /highlight >}}
{{% /panel %}}
{{% panel theme="info" header="RSYNC Example" %}}
{{< highlight bash >}}
# local to remote rsync command
rsync -avz /local/source/path <username>@attic.unl.edu:remote/destination/path
# remote to local rsync command
rsync -avz <username>@attic.unl.edu:remote/source/path /local/destination/path
{{< /highlight >}}
{{% /panel %}}
You can also access your data on Attic using our [high-speed
transfer servers]({{< relref "/handling_data/data_transfer/high_speed_data_transfers" >}}) if you prefer.
Simply use scp or sftp to connect to one of the transfer servers, and
your directory is mounted at `/attic/<groupname>/<username>`.
---
### Check Attic Usage
The usage and quota information for your group and the users in the
group are stored in a file named "disk\_usage.txt" in your group's
directory (`/attic/<groupname>`). You can use either [Globus Connect]({{< relref "/handling_data/data_transfer/globus_connect" >}}) or
scp to download it. Your usage and expiration is also shown in the web
interface (see below).
---
### Use the web interface
For convenience, a web interface is also provided. Simply go to
<a href="https://attic.unl.edu" class="external-link">https://attic.unl.edu</a>
and login with your HCC credentials. Using this interface, you can see
your quota usage and expiration, manage files, etc. **Please note we do
not recommend uploading/downloading large files this way**. Use one of
the other transfer methods above for large datasets.
+++
title = "Activating HCC Cluster Endpoints"
description = "How to activate HCC endpoints on Globus"
weight = 20
+++
You will not be able to transfer files to or from an HCC endpoint using Globus Connect without first activating the endpoint. Endpoints are available for Crane (`hcc#crane`), Rhino, (`hcc#rhino`), and Attic (`hcc#attic`). Follow the instructions below to activate any of these endpoints and begin making transfers.
1. [Sign in](https://www.globus.org/SignIn) to your Globus account using your campus credentials or your Globus ID (if you have one). Then click on 'Endpoints' in the left sidebar.
{{< figure src="/images/Glogin.png" >}}
{{< figure src="/images/endpoints.png" >}}
2. Find the endpoint you want by entering '`hcc#crane`', '`hcc#rhino`', or '`hcc#attic`' in the search box and hit 'enter'. Once you have found and selected the endpoint, click the green 'activate' icon. On the following page, click 'continue'.
{{< figure src="/images/activateEndpoint.png" >}}
{{< figure src="/images/EndpointContinue.png" >}}
3. You will be redirected to the HCC Globus Endpoint Activation page. Enter your *HCC* username and password (the password you usually use to log into the HCC clusters).
{{< figure src="/images/hccEndpoint.png" >}}
4. Next you will be prompted to
provide your *Duo* credentials. If you use the Duo Mobile app on
your smartphone or tablet, select 'Duo Push'. Once you approve the notification that is sent to your phone,
the activation will be complete. If you use a Yubikey for
authentication, select the 'Passcode' option and then press your
Yubikey to complete the activation. Upon successful activation, you
will be redirected to your Globus *Manage Endpoints* page.
{{< figure src="/images/EndpointPush.png" >}}
{{< figure src="/images/endpointComplete.png" >}}
The endpoint should now be ready
and will not have to be activated again for the next 7 days.
To transfer files between any two HCC clusters, you will need to
activate both endpoints individually.
Next, learn how to [make file transfers between HCC endpoints]({{< relref "/handling_data/data_transfer/globus_connect/file_transfers_between_endpoints" >}}) or how to [transfer between HCC endpoints and a personal computer]({{< relref "/handling_data/data_transfer/globus_connect/file_transfers_to_and_from_personal_workstations" >}}).
---
+++
title = "File Transfers Between Endpoints"
description = "How to transfer files between HCC clusters using Globus"
weight = 30
+++
To transfer files between HCC clusters, you will first need to
[activate]({{< relref "/handling_data/data_transfer/globus_connect/activating_hcc_cluster_endpoints" >}}) the
two endpoints you would like to use (the available endpoints
are: `hcc#crane` `hcc#rhino`, and `hcc#attic`). Once
that has been completed, follow the steps below to begin transferring
files. (Note: You can also transfer files between an HCC endpoint and
any other Globus endpoint for which you have authorized access. That
may include a [personal
endpoint]({{< relref "/handling_data/data_transfer/globus_connect/file_transfers_to_and_from_personal_workstations" >}}),
a [shared
endpoint]({{< relref "/handling_data/data_transfer/globus_connect/file_sharing" >}}),
or an endpoint on another computing resource or cluster. Once the
endpoints have been activated, the file transfer process is generally
the same regardless of the type of endpoints you use. For demonstration
purposes we use two HCC endpoints.)
1. Once both endpoints for the desired file transfer have been
activated, [sign in](https://www.globus.org/SignIn) to
your Globus account (if you are not already) and select
"Transfer or Sync to.." from the right sidebar. If you have
a small screen, you may have to click the menu icon
first.
{{< figure src="/images/Transfer.png">}}
2. Enter the names of the two endpoints you would like to use, or
select from the drop-down menus (for
example, `hcc#attic` and `hcc#crane`). Enter the
directory paths for both the source and destination (the 'from' and
'to' paths on the respective endpoints). Press 'Enter' to view files
under these directories. Select the files or directories you would
like to transfer (press *shift* or *control* to make multiple
selections) and click the blue highlighted arrow to start the
transfer.
{{< figure src="/images/startTransfer.png" >}}
3. Globus will display a message when your transfer has completed
(or in the unlikely event that it was unsuccessful), and you will
also receive an email. Select the 'refresh' icon to see your file
in the destination folder.
{{< figure src="/images/transferComplete.png" >}}
---
+++
title = "High Speed Data Transfers"
description = "How to transfer files directly from the transfer servers"
weight = 10
+++
Crane, Rhino, and Attic each have a dedicated transfer server with
10 Gb/s connectivity that allows
for faster data transfers than the login nodes. With [Globus
Connect]({{< relref "globus_connect" >}}), users
can take advantage of this connection speed when making large/cumbersome
transfers.
Those who prefer scp, sftp or
rsync clients can also benefit from this high-speed connectivity by
using these dedicated servers for data transfers:
Cluster | Transfer server
----------|----------------------
Crane | `crane-xfer.unl.edu`
Rhino | `rhino-xfer.unl.edu`
Attic | `attic-xfer.unl.edu`
{{% notice info %}}
Because the transfer servers are login-disabled, third-party transfers
between `crane-xfer`, and `attic-xfer` must be done via [Globus Connect]({{< relref "globus_connect" >}}).
{{% /notice %}}
+++
title = "Submitting an OpenMP Job"
description = "How to submit an OpenMP job on HCC resources."
+++
Submitting an OpenMP job is different from
[Submitting an MPI Job]({{< relref "submitting_an_mpi_job" >}})
since you must request multiple cores from a single node.
{{% panel theme="info" header="OpenMP example submission" %}}
{{< highlight batch >}}
#!/bin/sh
#SBATCH --ntasks-per-node=16 # 16 cores
#SBATCH --nodes=1 # 1 node
#SBATCH --mem-per-cpu=1024 # Minimum memory required per CPU (in megabytes)
#SBATCH --time=03:15:00 # Run time in hh:mm:ss
#SBATCH --error=/work/[groupname]/[username]/job.%J.err
#SBATCH --output=/work/[groupname]/[username]/job.%J.out
export OMP_NUM_THREADS=${SLURM_NTASKS_PER_NODE}
./openmp-app.exe
{{< /highlight >}}
{{% /panel %}}
Notice that we used `ntasks-per-node` to specify the number of cores we
want on a single node. Additionally, we specify that we only want
1 `node`.
`OMP_NUM_THREADS` is required to limit the number of cores that OpenMP
will use on the node. It is set to ${SLURM_NTASKS_PER_NODE} to
automatically match the `ntasks-per-node` value (in this example 16).
### Compiling
Directions to compile OpenMP can be found on
[Compiling an OpenMP Application]
({{< relref "/applications/user_software/compiling_an_openmp_application" >}}).
### Further Documentation
Further OpenMP documentation can be found on LLNL's
[OpenMP](https://computing.llnl.gov/tutorials/openMP) website.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment