diff --git a/content/FAQ/_index.md b/content/FAQ/_index.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..c51a547e16877530d974885b005a35eca8fcc221 100644 --- a/content/FAQ/_index.md +++ b/content/FAQ/_index.md @@ -0,0 +1,177 @@ ++++ +title = "FAQ" +description = "HCC Frequently Asked Questions" +weight = "10" ++++ + +- [I have an account, now what?](#i-have-an-account-now-what) +- [How do I change my password?](#how-do-i-change-my-password) +- [I forgot my password, how can I retrieve it?](#i-forgot-my-password-how-can-i-retrieve-it) +- [I just deleted some files and didn't mean to! Can I get them back?](#i-just-deleted-some-files-and-didn-t-mean-to-can-i-get-them-back) +- [How do I (re)activate Duo?](#how-do-i-re-activate-duo) +- [How many nodes/memory/time should I request?](#how-many-nodes-memory-time-should-i-request) +- [I am trying to run a job but nothing happens?](#i-am-trying-to-run-a-job-but-nothing-happens) +- [I keep getting the error "slurmstepd: error: Exceeded step memory limit at some point." What does this mean and how do I fix it?](#i-keep-getting-the-error-slurmstepd-error-exceeded-step-memory-limit-at-some-point-what-does-this-mean-and-how-do-i-fix-it) +- [I want to talk to a human about my problem. Can I do that?](#i-want-to-talk-to-a-human-about-my-problem-can-i-do-that) + +--- + +#### I have an account, now what? + +Congrats on getting an HCC account! Now you need to connect to a Holland +cluster. To do this, we use an SSH connection. SSH stands for Secure +Shell, and it allows you to securely connect to a remote computer and +operate it just like you would a personal machine. + +Depending on your operating system, you may need to install software to +make this connection. Check out on Quick Start Guides for information on +how to install the necessary software for your operating system + +- [For Mac/Linux Users]({{< relref "for_maclinux_users" >}}) +- [For Windows Users]({{< relref "for_windows_users" >}}) + +#### How do I change my password? + +#### I forgot my password, how can I retrieve it? + +Information on how to change or retrieve your password can be found on +the documentation page: [How to change your +password]({{< relref "/accounts/how_to_change_your_password" >}}) + + +All passwords must be at least 8 characters in length and must contain +at least one capital letter and one numeric digit. Passwords also cannot +contain any dictionary words. If you need help picking a good password, +consider using a (secure!) password generator such as +[this one provided by Random.org](https://www.random.org/passwords) + +To preserve the security of your account, we recommend changing the +default password you were given as soon as possible. + +#### I just deleted some files and didn't mean to! Can I get them back? + +That depends. Where were the files you deleted? + +**If the files were in your $HOME directory (/home/group/user/):** It's +possible. + +$HOME directories are backed up daily and we can restore your files as +they were at the time of our last backup. Please note that any changes +made to the files between when the backup was made and when you deleted +them will not be preserved. To have these files restored, please contact +HCC Support at +{{< icon name="envelope" >}}[hcc-support@unl.edu] (mailto:hcc-support@unl.edu) +as soon as possible. + +**If the files were in your $WORK directory (/work/group/user/):** No. + +Unfortunately, the $WORK directories are created as a short term place +to hold job files. This storage was designed to be quickly and easily +accessed by our worker nodes and as such is not conducive to backups. +Any irreplaceable files should be backed up in a secondary location, +such as Attic, the cloud, or on your personal machine. For more +information on how to prevent file loss, check out [Preventing File +Loss]({{< relref "preventing_file_loss" >}}). + +#### How do I (re)activate Duo? + +**If you have not activated Duo before:** + +Please stop by +[our offices](http://hcc.unl.edu/location) +along with a photo ID and we will be happy to activate it for you. If +you are not local to Omaha or Lincoln, contact us at +{{< icon name="envelope" >}}[hcc-support@unl.edu] (mailto:hcc-support@unl.edu) +and we will help you activate Duo remotely. + +**If you have activated Duo previously but now have a different phone +number:** + +Stop by our offices along with a photo ID and we can help you reactivate +Duo and update your account with your new phone number. + +**If you have activated Duo previously and have the same phone number:** + +Email us at +{{< icon name="envelope" >}}[hcc-support@unl.edu] (mailto:hcc-support@unl.edu) +from the email address your account is registered under and we will send +you a new link that you can use to activate Duo. + +#### How many nodes/memory/time should I request? + +**Short answer:** We don’t know. + +**Long answer:** The amount of resources required is highly dependent on +the application you are using, the input file sizes and the parameters +you select. Sometimes it can help to speak with someone else who has +used the software before to see if they can give you an idea of what has +worked for them. + +But ultimately, it comes down to trial and error; try different +combinations and see what works and what doesn’t. Good practice is to +check the output and utilization of each job you run. This will help you +determine what parameters you will need in the future. + +For more information on how to determine how many resources a completed +job used, check out the documentation on [Monitoring Jobs]({{< relref "monitoring_jobs" >}}). + +#### I am trying to run a job but nothing happens? + +Where are you trying to run the job from? You can check this by typing +the command \`pwd\` into the terminal. + +**If you are running from inside your $HOME directory +(/home/group/user/)**: + +Move your files to your $WORK directory (/work/group/user) and resubmit +your job. + +The worker nodes on our clusters have read-only access to the files in +$HOME directories. This means that when a job is submitted from $HOME, +the scheduler cannot write the output and error files in the directory +and the job is killed. It appears the job does nothing because no output +is produced. + +**If you are running from inside your $WORK directory:** + +Contact us at +{{< icon name="envelope" >}}[hcc-support@unl.edu] (mailto:hcc-support@unl.edu) +with your login, the name of the cluster you are running on, and the +full path to your submit script and we will be happy to help solve the +issue. + +##### I keep getting the error "slurmstepd: error: Exceeded step memory limit at some point." What does this mean and how do I fix it? + +This error occurs when the job you are running uses more memory than was +requested in your submit script. + +If you specified `--mem` or `--mem-per-cpu` in your submit script, try +increasing this value and resubmitting your job. + +If you did not specify `--mem` or `--mem-per-cpu` in your submit script, +chances are the default amount allotted is not sufficient. Add the line + +{{< highlight batch >}} +#SBATCH --mem=<memory_amount> +{{< /highlight >}} + +to your script with a reasonable amount of memory and try running it again. If you keep +getting this error, continue to increase the requested memory amount and +resubmit the job until it finishes successfully. + +For additional details on how to monitor usage on jobs, check out the +documentation on [Monitoring Jobs]({{< relref "monitoring_jobs" >}}). + +If you continue to run into issues, please contact us at +{{< icon name="envelope" >}}[hcc-support@unl.edu] (mailto:hcc-support@unl.edu) +for additional assistance. + +#### I want to talk to a human about my problem. Can I do that? + +Of course! We have an open door policy and invite you to stop by +[either of our offices](http://hcc.unl.edu/location) +anytime Monday through Friday between 9 am and 5 pm. One of the HCC +staff would be happy to help you with whatever problem or question you +have. Alternatively, you can drop one of us a line and we'll arrange a +time to meet: [Contact Us](https://hcc.unl.edu/contact-us). + diff --git a/content/applications/app_specific/Jupyter.md b/content/applications/app_specific/Jupyter.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..3a7e3f9f9fd06ec168a6ec0229bcaa7a1a70b22d 100644 --- a/content/applications/app_specific/Jupyter.md +++ b/content/applications/app_specific/Jupyter.md @@ -0,0 +1,58 @@ ++++ +title = "Jupyter Notebooks on Crane" +description = "How to access and use a Jupyter Notebook" +weight = 20 ++++ + +- [Connecting to Crane] (#connecting-to-crane) +- [Running Code] (#running-code) +- [Opening a Terminal] (#opening-a-terminal) +- [Using Custom Packages] (#using-custom-packages) + +## Connecting to Crane +----------------------- + Jupyter defines it's notebooks ("Jupyter Notebooks") as + an open-source web application that allows you to create and share documents that contain live code, + equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, + statistical modeling, data visualization, machine learning, and much more. + +1. To open a Jupyter notebook, [Sign in](https://crane.unl.edu) to crane.unl.edu using your hcc credentials (NOT your + campus credentials). +{{< figure src="/images/jupyterLogin.png" >}} + +2. Select your preferred authentication method. + + {{< figure src="/images/jupyterPush.png" >}} + +3. Choose a job profile. Select "Noteboook via SLURM Job | Small (1 core, 4GB RAM, 8 hours)" for light tasks such as debugging or small-scale testing. +Select the other options based on your computing needs. Note that a SLURM Job will save to your "work" directory. + +{{< figure src="/images/jupyterjob.png" >}} + +## Running Code + +1. Select the "New" dropdown menu and select the file type you want to create. + +{{< figure src="/images/jupyterNew.png" >}} +2. A new tab will open, where you can enter your code. Run your code by selecting the "play" icon. + +{{< figure src="/images/jupyterCode.png">}} + +## Opening a Terminal + +1. From your user home page, select "terminal" from the "New" drop-down menu. +{{< figure src="/images/jupyterTerminal.png">}} +2. A terminal opens in a new tab. You can enter [Linux commands] ({{< relref "basic_linux_commands" >}}) + at the prompt. +{{< figure src="/images/jupyterTerminal2.png">}} + +## Using Custom Packages + +Many popular `python` and `R` packages are already installed and available within Jupyter Notebooks. +However, it is possible to install custom packages to be used in notebooks by creating a custom Anaconda +Environment. Detailed information on how to create such an environment can be found at + [Using an Anaconda Environment in a Jupyter Notebook on Crane]({{< relref "/applications/user_software/using_anaconda_package_manager#using-an-anaconda-environment-in-a-jupyter-notebook-on-crane" >}}). + +--- + + diff --git a/content/applications/app_specific/allinea_profiling_and_debugging/allinea_performance_reports/blast_with_allinea_performance_reports.md b/content/applications/app_specific/allinea_profiling_and_debugging/allinea_performance_reports/blast_with_allinea_performance_reports.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..ec4291ed4d4fb318fac1d71327279306107d2bb2 100644 --- a/content/applications/app_specific/allinea_profiling_and_debugging/allinea_performance_reports/blast_with_allinea_performance_reports.md +++ b/content/applications/app_specific/allinea_profiling_and_debugging/allinea_performance_reports/blast_with_allinea_performance_reports.md @@ -0,0 +1,65 @@ ++++ +title = "BLAST with Allinea Performance Reports" +description = "Example of how to profile BLAST using Allinea Performance Reports." ++++ + +Simple example of using +[BLAST]({{< relref "/applications/app_specific/bioinformatics_tools/alignment_tools/blast/running_blast_alignment" >}}) +with Allinea Performance Reports (`perf-report`) on Crane is shown below: + +{{% panel theme="info" header="blastn_perf_report.submit" %}} +{{< highlight batch >}} +#!/bin/sh +#SBATCH --job-name=BlastN +#SBATCH --nodes=1 +#SBATCH --ntasks=16 +#SBATCH --time=20:00:00 +#SBATCH --mem=50gb +#SBATCH --output=BlastN.info +#SBATCH --error=BlastN.error + +module load allinea +module load blast/2.2.29 + +cd $WORK/<project_folder> +cp -r /work/HCC/DATA/blastdb/nt/ /tmp/ +cp input_reads.fasta /tmp/ + +perf-report --openmp-threads=$SLURM_NTASKS_PER_NODE --nompi `which blastn` \ +-query /tmp/input_reads.fasta -db /tmp/nt/nt -out \ +blastn_output.alignments -num_threads $SLURM_NTASKS_PER_NODE + +cp blastn\_output.alignments . +{{< /highlight >}} +{{% /panel %}} + +BLAST uses OpenMP and therefore the Allinea Performance Reports options +`--openmp-threads` and `--nompi` are used. The perf-report +part, `perf-report --openmp-threads=$SLURM_NTASKS_PER_NODE --nompi`, +is placed in front of the actual `blastn` command we want +to analyze. + +{{% notice info %}} +If you see the error "**Allinea Performance Reports - target file +'application' does not exist on this machine... exiting**", this means +that instead of just using the executable '*application*', the full path +to that application is required. This is the reason why in the script +above, instead of using "*blastn*", we use *\`which blastn\`* which +gives the full path of the *blastn* executable. +{{% /notice %}} + +When the application finishes, the performance report is generated in +the working directory. +For the executed application, this is how the report looks like: + +{{< figure src="/images/11635296.png" width="850" >}} + +From the report, we can see that **blastn** is Compute-Bound +application. The difference between mean (11.1 GB) and peak (26.3 GB) +memory is significant, and this may be sign of workload imbalance or a +memory leak. Moreover, 89.6% of the time is spent in synchronizing +threads in parallel regions which can lead to workload imbalance. + +Running Allinea Performance Reports and identifying application +bottlenecks is really useful for improving the application and better +utilization of the available resources. diff --git a/content/applications/app_specific/allinea_profiling_and_debugging/allinea_performance_reports/ray_with_allinea_performance_reports.md b/content/applications/app_specific/allinea_profiling_and_debugging/allinea_performance_reports/ray_with_allinea_performance_reports.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..ba18fd08924a2d40eb7fbd5bbce34ce16145dfe2 100644 --- a/content/applications/app_specific/allinea_profiling_and_debugging/allinea_performance_reports/ray_with_allinea_performance_reports.md +++ b/content/applications/app_specific/allinea_profiling_and_debugging/allinea_performance_reports/ray_with_allinea_performance_reports.md @@ -0,0 +1,44 @@ ++++ +title = "Ray with Allinea Performance Reports" +description = "Example of how to profile Ray using Allinea Performance Reports" ++++ + +Simple example of using [Ray]({{< relref "/applications/app_specific/bioinformatics_tools/de_novo_assembly_tools/ray" >}}) +with Allinea PerformanceReports (`perf-report`) on Tusker is shown below: + +{{% panel theme="info" header="ray_perf_report.submit" %}} +{{< highlight batch >}} +#!/bin/sh +#SBATCH --job-name=Ray +#SBATCH --ntasks-per-node=16 +#SBATCH --time=10:00:00 +#SBATCH --mem=70gb +#SBATCH --output=Ray.info +#SBATCH --error=Ray.error + +module load allinea +module load compiler/gcc/4.7 openmpi/1.6 ray/2.3 + +perf-report mpiexec -n 16 Ray -k 31 -p -p input_reads_pair_1.fasta input_reads\_pair_2.fasta -o output_directory +{{< /highlight >}} +{{% /panel %}} + +Ray is MPI and therefore additional Allinea Performance Reports options +are not required. The `perf-report` command is placed in front of the +actual `Ray` command we want to analyze. + +When the application finishes, the performance report is generated in +the working directory. +For the executed application, this is how the report looks like: + +{{< figure src="/images/11635303.png" width="850" >}} + +From the report, we can see that **Ray **is Compute-Bound application. +Most of the running time is spent in point-to-point calls with a low +transfer rate which may be caused by inefficient message sizes. +Therefore, running this application with fewer MPI processes and more +data on each process may be more efficient. + +Running Allinea Performance Reports and identifying application +bottlenecks is really useful for improving the application and better +utilization of the available resources. diff --git a/content/applications/app_specific/bioinformatics_tools/alignment_tools/blast/running_blast_alignment.md b/content/applications/app_specific/bioinformatics_tools/alignment_tools/blast/running_blast_alignment.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..b93fb6d8811a6e1ce83d17d6fbb4a4d7e645d02e 100644 --- a/content/applications/app_specific/bioinformatics_tools/alignment_tools/blast/running_blast_alignment.md +++ b/content/applications/app_specific/bioinformatics_tools/alignment_tools/blast/running_blast_alignment.md @@ -0,0 +1,124 @@ ++++ +title = " Running BLAST Alignment" +description = "How to run BLAST alignment on HCC resources" +weight = "10" ++++ + + +Basic BLAST has the following commands: + +- **blastn**: search nucleotide database using a nucleotide query +- **blastp**: search protein database using a protein query +- **blastx**: search protein database using a translated nucleotide query +- **tblastn**: search translated nucleotide database using a protein query +- **tblastx**: search translated nucleotide database using a translated nucleotide query + + +The basic usage of **blastn** is: +{{< highlight bash >}} +$ blastn -query input_reads.fasta -db input_reads_db -out blastn_output.alignments [options] +{{< /highlight >}} +where **input_reads.fasta** is an input file of sequence data in fasta format, **input_reads_db** is the generated BLAST database, and **blastn_output.alignments** is the output file where the alignments are stored. + +Additional parameters can be found in the [BLAST manual] (https://www.ncbi.nlm.nih.gov/books/NBK279690/), or by typing: +{{< highlight bash >}} +$ blastn -help +{{< /highlight >}} + +These BLAST alignment commands are multi-threaded, and therefore using the BLAST option **-num_threads <number_of_CPUs>** is recommended. + + +HCC hosts multiple BLAST databases and indices on Crane. In order to use these resources, the ["biodata" module] ({{<relref "/applications/app_specific/bioinformatics_tools/biodata_module">}}) needs to be loaded first. The **$BLAST** variable contains the following currently available databases: + +- **16SMicrobial** +- **env_nt** +- **est** +- **est_human** +- **est_mouse** +- **est_others** +- **gss** +- **human_genomic** +- **human_genomic_transcript** +- **mouse_genomic_transcript** +- **nr** +- **nt** +- **other_genomic** +- **refseq_genomic** +- **refseq_rna** +- **sts** +- **swissprot** +- **tsa_nr** +- **tsa_nt** + +If you want to create and use a BLAST database that is not mentioned above, check [Create Local BLAST Database]({{<relref "create_local_blast_database" >}}). + + +Basic SLURM example of nucleotide BLAST run against the non-redundant **nt** BLAST database with `8 CPUs` is provided below. When running BLAST alignment, it is recommended to first copy the query and database files to the **/scratch/** directory of the worker node. Moreover, the BLAST output is also saved in this directory (**/scratch/blastn_output.alignments**). After BLAST finishes, the output file is copied from the worker node to your current work directory. +{{% notice info %}} +**Please note that the worker nodes can not write to the */home/* directories and therefore you need to run your job from your */work/* directory.** +**This example will first copy your database to faster local storage called “scratch”. This can greatly improve performance!** +{{% /notice %}} + +{{% panel header="`blastn_alignment.submit`"%}} +{{< highlight bash >}} +#!/bin/sh +#SBATCH --job-name=BlastN +#SBATCH --nodes=1 +#SBATCH --ntasks-per-node=8 +#SBATCH --time=168:00:00 +#SBATCH --mem=20gb +#SBATCH --output=BlastN.%J.out +#SBATCH --error=BlastN.%J.err + +module load blast/2.7 +module load biodata/1.0 + +cd $WORK/<project_folder> +cp $BLAST/nt.* /scratch/ +cp input_reads.fasta /scratch/ + +blastn -query /scratch/input_reads.fasta -db /scratch/nt -out /scratch/blastn_output.alignments -num_threads $SLURM_NTASKS_PER_NODE + +cp /scratch/blastn_output.alignments $WORK/<project_folder> +{{< /highlight >}} +{{% /panel %}} + + +One important BLAST parameter is the **e-value threshold** that changes the number of hits returned by showing only those with value lower than the given. To show the hits with **e-value** lower than 1e-10, modify the given script as follows: +{{< highlight bash >}} +$ blastn -query input_reads.fasta -db input_reads_db -out blastn_output.alignments -num_threads $SLURM_NTASKS_PER_NODE -evalue 1e-10 +{{< /highlight >}} + + +The default BLAST output is in pairwise format. However, BLAST’s parameter **-outfmt** supports output in [different formats] (https://www.ncbi.nlm.nih.gov/books/NBK279684/) that are easier for parsing. + + +Basic SLURM example of protein BLAST run against the non-redundant **nr **BLAST database with tabular output format and `8 CPUs` is shown below. Similarly as before, the query and database files are copied to the **/scratch/** directory. The BLAST output is also saved in this directory (**/scratch/blastx_output.alignments**). After BLAST finishes, the output file is copied from the worker node to your current work directory. +{{% notice info %}} +**Please note that the worker nodes can not write to the */home/* directories and therefore you need to run your job from your */work/* directory.** +**This example will first copy your database to faster local storage called “scratch”. This can greatly improve performance!** +{{% /notice %}} + +{{% panel header="`blastx_alignment.submit`"%}} +{{< highlight bash >}} +#!/bin/sh +#SBATCH --job-name=BlastX +#SBATCH --nodes=1 +#SBATCH --ntasks-per-node=8 +#SBATCH --time=168:00:00 +#SBATCH --mem=20gb +#SBATCH --output=BlastX.%J.out +#SBATCH --error=BlastX.%J.err + +module load blast/2.7 +module load biodata/1.0 + +cd $WORK/<project_folder> +cp $BLAST/nr.* /scratch/ +cp input_reads.fasta /scratch/ + +blastx -query /scratch/input_reads.fasta -db /scratch/nr -outfmt 6 -out /scratch/blastx_output.alignments -num_threads $SLURM_NTASKS_PER_NODE + +cp /scratch/blastx_output.alignments $WORK/<project_folder> +{{< /highlight >}} +{{% /panel %}} diff --git a/content/applications/app_specific/bioinformatics_tools/biodata_module.md b/content/applications/app_specific/bioinformatics_tools/biodata_module.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0257bcf5f52bc1ede31253e14e143cf558449d4f 100644 --- a/content/applications/app_specific/bioinformatics_tools/biodata_module.md +++ b/content/applications/app_specific/bioinformatics_tools/biodata_module.md @@ -0,0 +1,88 @@ ++++ +title = "Biodata Module" +description = "How to use Biodata Module on HCC machines" +scripts = ["https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/jquery.tablesorter.min.js", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-pager.min.js","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-filter.min.js","/js/sort-table.js"] +css = ["http://mottie.github.io/tablesorter/css/theme.default.css","https://mottie.github.io/tablesorter/css/theme.dropbox.css", "https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/jquery.tablesorter.pager.min.css","https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/filter.formatter.min.css"] +weight = "52" ++++ + + +HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan), genome files, short read aligned indices etc. on Crane. +In order to use these resources, the "**biodata**" module needs to be loaded first. +For how to load module, please check [Module Commands]({{< relref "/applications/modules/_index.md" >}}). + +Loading the "**biodata**" module will pre-set many environment variables, but most likely you will only need a subset of them. Environment variables can be used in your command or script by prefixing `$` to the name. + +The major environment variables are: +**$DATA** - main directory +**$BLAST** - Directory containing all available BLAST (nucleotide and protein) databases +**$KEGG** - KEGG database main entry point (requires license) +**$PANTHER** - PANTHER database main entry point (latest) +**$IPR** - InterProScan database main entry point (latest) +**$GENOMES** - Directory containing all available genomes (multiple sources, builds possible +**$INDICES** - Directory containing indices for bowtie, bowtie2, bwa for all available genomes +**$UNIPROT** - Directory containing latest release of full UniProt database + + +In order to check what genomes are available, you can type: +{{< highlight bash >}} +$ ls $GENOMES +{{< /highlight >}} + + +In order to check what BLAST databases are available, you can just type: +{{< highlight bash >}} +$ ls $BLAST +{{< /highlight >}} + + +An example of how to run Bowtie2 local alignment on Crane utilizing the default Horse, *Equus caballus* index (*BOWTIE2\_HORSE*) with paired-end fasta files and 8 CPUs is shown below: +{{% panel header="`bowtie2_alignment.submit`"%}} +{{< highlight bash >}} +#!/bin/sh +#SBATCH --job-name=Bowtie2 +#SBATCH --nodes=1 +#SBATCH --ntasks-per-node=8 +#SBATCH --time=168:00:00 +#SBATCH --mem=10gb +#SBATCH --output=Bowtie2.%J.out +#SBATCH --error=Bowtie2.%J.err + +module load bowtie/2.2 +module load biodata + +bowtie2 -x $BOWTIE2_HORSE -f -1 input_reads_pair_1.fasta -2 input_reads_pair_2.fasta -S bowtie2_alignments.sam --local -p $SLURM_NTASKS_PER_NODE + +{{< /highlight >}} +{{% /panel %}} + + +An example of BLAST run against the non-redundant nucleotide database available on Crane is provided below: +{{% panel header="`blastn_alignment.submit`"%}} +{{< highlight bash >}} +#!/bin/sh +#SBATCH --job-name=BlastN +#SBATCH --nodes=1 +#SBATCH --ntasks-per-node=8 +#SBATCH --time=168:00:00 +#SBATCH --mem=10gb +#SBATCH --output=BlastN.%J.out +#SBATCH --error=BlastN.%J.err + +module load blast/2.7 +module load biodata +cp $BLAST/nt.* /scratch +cp input_reads.fasta /scratch + +blastn -db /scratch/nt -query /scratch/input_reads.fasta -out /scratch/blast_nucleotide.results +cp /scratch/blast_nucleotide.results . + +{{< /highlight >}} +{{% /panel %}} + + +### Available Organisms + +The organisms and their appropriate environmental variables for all genomes and chromosome files, as well as indices are shown in the table below. + +{{< table url="http://rhino-head.unl.edu:8192/bio/data/json" >}} diff --git a/content/applications/app_specific/dmtcp_checkpointing.md b/content/applications/app_specific/dmtcp_checkpointing.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..9de1c055fe6790b4d29caea3f04d16b112047bb2 100644 --- a/content/applications/app_specific/dmtcp_checkpointing.md +++ b/content/applications/app_specific/dmtcp_checkpointing.md @@ -0,0 +1,138 @@ ++++ +title = "DMTCP Checkpointing" +description = "How to use the DMTCP utility to checkpoint your application." ++++ + +[DMTCP](http://dmtcp.sourceforge.net) +(Distributed MultiThreaded Checkpointing) is a checkpointing package for +applications. Using checkpointing allows resuming of a failing +simulation due to failing resources (e.g. hardware, software, exceeded +time and memory resources). + +DMTCP supports both sequential and multi-threaded applications. Some +examples of binary programs on Linux distributions that can be used with +DMTCP are OpenMP, MATLAB, Python, Perl, MySQL, bash, gdb, X-Windows etc. + +DMTCP provides support for several resource managers, including SLURM, +the resource manager used in HCC. The DMTCP module is available both on +Crane, and is enabled by typing: + +{{< highlight bash >}} +module load dmtcp +{{< /highlight >}} + +After the module is loaded, the first step is to run the command: + +{{< highlight bash >}} +[<username>@login.crane ~]$ dmtcp_launch --new-coordinator --rm --interval <interval_time_seconds> <your_command> +{{< /highlight >}} + +where `--rm` option enables SLURM support, +**\<interval_time_seconds\>** is the time in seconds between +automatic checkpoints, and **\<your_command\>** is the actual +command you want to run and checkpoint. + +Beside the general options shown above, more `dmtcp_launch` options +can be seen by using: + +{{< highlight bash >}} +[<username>@login.crane ~]$ dmtcp_launch --help +{{< /highlight >}} + +`dmtcp_launch` creates few files that are used to resume the +cancelled job, such as *ckpt\_\*.dmtcp* and +*dmtcp\_restart\_script\*.sh*. Unless otherwise stated +(using `--ckptdir` option), these files are stored in the current +working directory. + + +The second step of DMTCP is to restart the cancelled job, and there are +two ways of doing that: + +- `dmtcp_restart ckpt_*.dmtcp` *\<options\>* (before running + this command delete any old *ckp\_\*.dmtcp* files in your current + directory) + +- `./dmtcp_restart_script.sh` *\<options\>* + +If there are no options defined in the *<options>* field, DMTCP +will keep running with the options defined in the initial +**dmtcp\_launch** call (such as interval time, output directory etc). + + +Simple example of using DMTCP with +[BLAST]({{< relref "/applications/app_specific/bioinformatics_tools/alignment_tools/blast/running_blast_alignment" >}}) +on crane is shown below: + +{{% panel theme="info" header="dmtcp_blastx.submit" %}} +{{< highlight batch >}} +#!/bin/sh +#SBATCH --job-name=BlastX +#SBATCH --nodes=1 +#SBATCH --ntasks=8 +#SBATCH --time=50:00:00 +#SBATCH --mem=20gb +#SBATCH --output=BlastX_info_1.txt +#SBATCH --error=BlastX_error_1.txt + +module load dmtcp +module load blast/2.4 + +cd $WORK/<project_folder> +cp -r /work/HCC/DATA/blastdb/nr/ /tmp/ +cp input_reads.fasta /tmp/ + +dmtcp_launch --new-coordinator --rm --interval 3600 blastx -query \ +/tmp/input_reads.fasta -db /tmp/nr/nr -out blastx_output.alignments \ +-num_threads $SLURM_NTASKS_PER_NODE +{{< /highlight >}} +{{% /panel %}} + +In this example, DMTCP takes checkpoints every hour (`--interval 3600`), +and the actual command we want to checkpoint is `blastx` with +some general BLAST options defined with `-query`, `-db`, `-out`, +`-num_threads`. + +If this job is killed for various reasons, it can be restarted using the +following submit file: + +{{% panel theme="info" header="dmtcp_restart_blastx.submit" %}} +{{< highlight batch >}} +#!/bin/sh +#SBATCH --job-name=BlastX +#SBATCH --nodes=1 +#SBATCH --ntasks=8 +#SBATCH --time=50:00:00 +#SBATCH --mem=20gb +#SBATCH --output=BlastX_info_2.txt +#SBATCH --error=BlastX_error_2.txt + +module load dmtcp +module load blast/2.4 + +cd $WORK/<project_folder> +cp -r /work/HCC/DATA/blastdb/nr/ /tmp/ +cp input_reads.fasta /tmp/ + +# Start DMTCP +dmtcp_coordinator --daemon --port 0 --port-file /tmp/port +export DMTCP_COORD_HOST=`hostname` +export DMTCP_COORD_PORT=$(</tmp/port) + +# Restart job +./dmtcp_restart_script.sh +{{< /highlight >}} +{{% /panel %}} + +{{% notice info %}} +`dmtcp_restart` generates new +`ckpt_*.dmtcp` and `dmtcp_restart_script*.sh` files. Therefore, if +the restarted job is also killed due to unavailable/exceeded resources, +you can resubmit the same job again without any changes in the submit +file shown above (just don't forget to delete the old `ckpt_*.dmtcp` +files if you are using these files instead of `dmtcp_restart_script.sh`) +{{% /notice %}} + +Even though DMTCP tries to support most mainstream and commonly used +applications, there is no guarantee that every application can be +checkpointed and restarted. diff --git a/content/applications/app_specific/fortran_c_on_hcc.md b/content/applications/app_specific/fortran_c_on_hcc.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..beec01f8505f52c39bb67e9af09aa8fc338a7ba2 100644 --- a/content/applications/app_specific/fortran_c_on_hcc.md +++ b/content/applications/app_specific/fortran_c_on_hcc.md @@ -0,0 +1,219 @@ ++++ +title = "Fortran/C on HCC" +description = "How to compile and run Fortran/C program on HCC machines" +weight = "50" ++++ + +This quick start demonstrates how to implement a Fortran/C program on +HCC supercomputers. The sample codes and submit scripts can be +downloaded from [serial_dir.zip](/attachments/serial_dir.zip). + +#### Login to a HCC Cluster + +Log in to a HCC cluster through PuTTY ([For Windows Users]({{< relref "/connecting/for_windows_users">}})) or Terminal ([For Mac/Linux +Users]({{< relref "/connecting/for_maclinux_users">}})) and make a subdirectory called `serial_dir` under the `$WORK` directory. + +{{< highlight bash >}} +$ cd $WORK +$ mkdir serial_dir +{{< /highlight >}} + +In the subdirectory `serial_dir`, save all the relevant Fortran/C codes. Here we include two demo +programs, `demo_f_serial.f90` and `demo_c_serial.c`, that compute the sum from 1 to 20. + +{{%expand "demo_f_serial.f90" %}} +{{< highlight bash >}} +Program demo_f_serial + implicit none + integer, parameter :: N = 20 + real*8 w + integer i + common/sol/ x + real*8 x + real*8, dimension(N) :: y + do i = 1,N + w = i*1d0 + call proc(w) + y(i) = x + write(6,*) 'i,x = ', i, y(i) + enddo + write(6,*) 'sum(y) =',sum(y) +Stop +End Program +Subroutine proc(w) + real*8, intent(in) :: w + common/sol/ x + real*8 x + x = w +Return +End Subroutine +{{< /highlight >}} +{{% /expand %}} + + +{{%expand "demo_c_serial.c" %}} +{{< highlight c >}} +//demo_c_serial +#include <stdio.h> + +double proc(double w){ + double x; + x = w; + return x; +} + +int main(int argc, char* argv[]){ + int N=20; + double w; + int i; + double x; + double y[N]; + double sum; + for (i = 1; i <= N; i++){ + w = i*1e0; + x = proc(w); + y[i-1] = x; + printf("i,x= %d %lf\n", i, y[i-1]) ; + } + + sum = 0e0; + for (i = 1; i<= N; i++){ + sum = sum + y[i-1]; + } + + printf("sum(y)= %lf\n", sum); + +return 0; +} +{{< /highlight >}} +{{% /expand %}} + +--- + +#### Compiling the Code + +The compiling of a Fortran/C++ code to executable is usually done behind +the scene in a Graphical User Interface (GUI) environment, such as +Microsoft Visual Studio. In a HCC cluster, the compiling is done +explicitly by first loading a choice compiler and then executing the +corresponding compiling command. Here we will use the GNU Complier +Collection, `gcc`, for demonstration. Other available compilers such as +`intel` or `pgi` can be looked up using the command +line `module avail`. Before compiling the code, make sure there is no +dependency on any numerical library in the code. If invoking a numerical +library is necessary, contact a HCC specialist +({{< icon name="envelope" >}}[hcc-support@unl.edu] (mailto:hcc-support@unl.edu)) to +discuss implementation options. + +{{< highlight bash >}} +$ module load compiler/gcc/8.2 +$ gfortran demo_f_serial.f90 -o demo_f_serial.x +$ gcc demo_c_serial.c -o demo_c_serial.x +{{< /highlight >}} + +The above commends load the `gcc` complier and use the compiling +commands `gfortran` or `gcc` to compile the codes to`.x` files +(executables). + +#### Creating a Submit Script + +Create a submit script to request one core (default) and 1-min run time +on the supercomputer. The name of the main program enters at the last +line. + +{{% panel header="`submit_f.serial`"%}} +{{< highlight bash >}} +#!/bin/sh +#SBATCH --mem-per-cpu=1024 +#SBATCH --time=00:01:00 +#SBATCH --job-name=Fortran +#SBATCH --error=Fortran.%J.err +#SBATCH --output=Fortran.%J.out + +module load compiler/gcc/4.9 +./demo_f_serial.x +{{< /highlight >}} +{{% /panel %}} + +{{% panel header="`submit_c.serial`"%}} +{{< highlight bash >}} +#!/bin/sh +#SBATCH --mem-per-cpu=1024 +#SBATCH --time=00:01:00 +#SBATCH --job-name=C +#SBATCH --error=C.%J.err +#SBATCH --output=C.%J.out + +module load compiler/gcc/4.9 +./demo_c_serial.x +{{< /highlight >}} +{{% /panel %}} + +#### Submit the Job + +The job can be submitted through the command `sbatch`. The job status +can be monitored by entering `squeue` with the `-u` option. + +{{< highlight bash >}} +$ sbatch submit_f.serial +$ sbatch submit_c.serial +$ squeue -u <username> +{{< /highlight >}} + +Replace `<username>` with your HCC username. + +#### Sample Output + +The sum from 1 to 20 is computed and printed to the `.out` file (see +below). +{{%expand "Fortran.out" %}} +{{< highlight batchfile>}} + i,x = 1 1.0000000000000000 + i,x = 2 2.0000000000000000 + i,x = 3 3.0000000000000000 + i,x = 4 4.0000000000000000 + i,x = 5 5.0000000000000000 + i,x = 6 6.0000000000000000 + i,x = 7 7.0000000000000000 + i,x = 8 8.0000000000000000 + i,x = 9 9.0000000000000000 + i,x = 10 10.000000000000000 + i,x = 11 11.000000000000000 + i,x = 12 12.000000000000000 + i,x = 13 13.000000000000000 + i,x = 14 14.000000000000000 + i,x = 15 15.000000000000000 + i,x = 16 16.000000000000000 + i,x = 17 17.000000000000000 + i,x = 18 18.000000000000000 + i,x = 19 19.000000000000000 + i,x = 20 20.000000000000000 + sum(y) = 210.00000000000000 +{{< /highlight >}} +{{% /expand %}} + +{{%expand "C.out" %}} +{{< highlight batchfile>}} +i,x= 1 1.000000 +i,x= 2 2.000000 +i,x= 3 3.000000 +i,x= 4 4.000000 +i,x= 5 5.000000 +i,x= 6 6.000000 +i,x= 7 7.000000 +i,x= 8 8.000000 +i,x= 9 9.000000 +i,x= 10 10.000000 +i,x= 11 11.000000 +i,x= 12 12.000000 +i,x= 13 13.000000 +i,x= 14 14.000000 +i,x= 15 15.000000 +i,x= 16 16.000000 +i,x= 17 17.000000 +i,x= 18 18.000000 +i,x= 19 19.000000 +i,x= 20 20.000000 +sum(y)= 210.000000 +{{< /highlight >}} +{{% /expand %}} diff --git a/content/applications/app_specific/mpi_jobs_on_hcc.md b/content/applications/app_specific/mpi_jobs_on_hcc.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..bfb9abd7f86588f33dd98b4ea2ef7e10c6d147c4 100644 --- a/content/applications/app_specific/mpi_jobs_on_hcc.md +++ b/content/applications/app_specific/mpi_jobs_on_hcc.md @@ -0,0 +1,322 @@ ++++ +title = "MPI Jobs on HCC" +description = "How to compile and run MPI programs on HCC machines" +weight = "52" ++++ + +This quick start demonstrates how to implement a parallel (MPI) +Fortran/C program on HCC supercomputers. The sample codes and submit +scripts can be downloaded from [mpi_dir.zip](/attachments/mpi_dir.zip). + +#### Login to a HCC Cluster + +Log in to a HCC cluster through PuTTY ([For Windows Users]({{< relref "/connecting/for_windows_users">}})) or Terminal ([For Mac/Linux +Users]({{< relref "/connecting/for_maclinux_users">}})) and make a subdirectory called `mpi_dir` under the `$WORK` directory. + +{{< highlight bash >}} +$ cd $WORK +$ mkdir mpi_dir +{{< /highlight >}} + +In the subdirectory `mpi_dir`, save all the relevant codes. Here we +include two demo programs, `demo_f_mpi.f90` and `demo_c_mpi.c`, that +compute the sum from 1 to 20 through parallel processes. A +straightforward parallelization scheme is used for demonstration +purpose. First, the master core (i.e. `myid=0`) distributes equal +computation workload to a certain number of cores (as specified by +`--ntasks `in the submit script). Then, each worker core computes a +partial summation as output. Finally, the master core collects the +outputs from all worker cores and perform an overall summation. For easy +comparison with the serial code ([Fortran/C on HCC]({{< relref "fortran_c_on_hcc">}})), the +added lines in the parallel code (MPI) are marked with "!=" or "//=". + +{{%expand "demo_f_mpi.f90" %}} +{{< highlight fortran >}} +Program demo_f_mpi +!====== MPI ===== + use mpi +!================ + implicit none + integer, parameter :: N = 20 + real*8 w + integer i + common/sol/ x + real*8 x + real*8, dimension(N) :: y +!============================== MPI ================================= + integer ind + real*8, dimension(:), allocatable :: y_local + integer numnodes,myid,rc,ierr,start_local,end_local,N_local + real*8 allsum +!==================================================================== + +!============================== MPI ================================= + call mpi_init( ierr ) + call mpi_comm_rank ( mpi_comm_world, myid, ierr ) + call mpi_comm_size ( mpi_comm_world, numnodes, ierr ) + ! + N_local = N/numnodes + allocate ( y_local(N_local) ) + start_local = N_local*myid + 1 + end_local = N_local*myid + N_local +!==================================================================== + do i = start_local, end_local + w = i*1d0 + call proc(w) + ind = i - N_local*myid + y_local(ind) = x +! y(i) = x +! write(6,*) 'i, y(i)', i, y(i) + enddo +! write(6,*) 'sum(y) =',sum(y) +!============================================== MPI ===================================================== + call mpi_reduce( sum(y_local), allsum, 1, mpi_real8, mpi_sum, 0, mpi_comm_world, ierr ) + call mpi_gather ( y_local, N_local, mpi_real8, y, N_local, mpi_real8, 0, mpi_comm_world, ierr ) + + if (myid == 0) then + write(6,*) '-----------------------------------------' + write(6,*) '*Final output from... myid=', myid + write(6,*) 'numnodes =', numnodes + write(6,*) 'mpi_sum =', allsum + write(6,*) 'y=...' + do i = 1, N + write(6,*) y(i) + enddo + write(6,*) 'sum(y)=', sum(y) + endif + + deallocate( y_local ) + call mpi_finalize(rc) +!======================================================================================================== + +Stop +End Program +Subroutine proc(w) + real*8, intent(in) :: w + common/sol/ x + real*8 x + + x = w + +Return +End Subroutine +{{< /highlight >}} +{{% /expand %}} + +{{%expand "demo_c_mpi.c" %}} +{{< highlight c >}} +//demo_c_mpi +#include <stdio.h> +//======= MPI ======== +#include "mpi.h" +#include <stdlib.h> +//==================== + +double proc(double w){ + double x; + x = w; + return x; +} + +int main(int argc, char* argv[]){ + int N=20; + double w; + int i; + double x; + double y[N]; + double sum; +//=============================== MPI ============================ + int ind; + double *y_local; + int numnodes,myid,rc,ierr,start_local,end_local,N_local; + double allsum; +//================================================================ +//=============================== MPI ============================ + MPI_Init(&argc, &argv); + MPI_Comm_rank( MPI_COMM_WORLD, &myid ); + MPI_Comm_size ( MPI_COMM_WORLD, &numnodes ); + N_local = N/numnodes; + y_local=(double *) malloc(N_local*sizeof(double)); + start_local = N_local*myid + 1; + end_local = N_local*myid + N_local; +//================================================================ + + for (i = start_local; i <= end_local; i++){ + w = i*1e0; + x = proc(w); + ind = i - N_local*myid; + y_local[ind-1] = x; +// y[i-1] = x; +// printf("i,x= %d %lf\n", i, y[i-1]) ; + } + sum = 0e0; + for (i = 1; i<= N_local; i++){ + sum = sum + y_local[i-1]; + } +// printf("sum(y)= %lf\n", sum); +//====================================== MPI =========================================== + MPI_Reduce( &sum, &allsum, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD ); + MPI_Gather( &y_local[0], N_local, MPI_DOUBLE, &y[0], N_local, MPI_DOUBLE, 0, MPI_COMM_WORLD ); + + if (myid == 0){ + printf("-----------------------------------\n"); + printf("*Final output from... myid= %d\n", myid); + printf("numnodes = %d\n", numnodes); + printf("mpi_sum = %lf\n", allsum); + printf("y=...\n"); + for (i = 1; i <= N; i++){ + printf("%lf\n", y[i-1]); + } + sum = 0e0; + for (i = 1; i<= N; i++){ + sum = sum + y[i-1]; + } + + printf("sum(y) = %lf\n", sum); + + } + + free( y_local ); + MPI_Finalize (); +//====================================================================================== + +return 0; +} +{{< /highlight >}} +{{% /expand %}} + +--- + +#### Compiling the Code + +The compiling of a MPI code requires first loading a compiler "engine" +such as `gcc`, `intel`, or `pgi` and then loading a MPI wrapper +`openmpi`. Here we will use the GNU Complier Collection, `gcc`, for +demonstration. + +{{< highlight bash >}} +$ module load compiler/gcc/6.1 openmpi/2.1 +$ mpif90 demo_f_mpi.f90 -o demo_f_mpi.x +$ mpicc demo_c_mpi.c -o demo_c_mpi.x +{{< /highlight >}} + +The above commends load the `gcc` complier with the `openmpi` wrapper. +The compiling commands `mpif90` or `mpicc` are used to compile the codes +to`.x` files (executables). + +### Creating a Submit Script + +Create a submit script to request 5 cores (with `--ntasks`). A parallel +execution command `mpirun ./` needs to enter to last line before the +main program name. + +{{% panel header="`submit_f.mpi`"%}} +{{< highlight bash >}} +#!/bin/sh +#SBATCH --ntasks=5 +#SBATCH --mem-per-cpu=1024 +#SBATCH --time=00:01:00 +#SBATCH --job-name=Fortran +#SBATCH --error=Fortran.%J.err +#SBATCH --output=Fortran.%J.out + +mpirun ./demo_f_mpi.x +{{< /highlight >}} +{{% /panel %}} + +{{% panel header="`submit_c.mpi`"%}} +{{< highlight bash >}} +#!/bin/sh +#SBATCH --ntasks=5 +#SBATCH --mem-per-cpu=1024 +#SBATCH --time=00:01:00 +#SBATCH --job-name=C +#SBATCH --error=C.%J.err +#SBATCH --output=C.%J.out + +mpirun ./demo_c_mpi.x +{{< /highlight >}} +{{% /panel %}} + +#### Submit the Job + +The job can be submitted through the command `sbatch`. The job status +can be monitored by entering `squeue` with the `-u` option. + +{{< highlight bash >}} +$ sbatch submit_f.mpi +$ sbatch submit_c.mpi +$ squeue -u <username> +{{< /highlight >}} + +Replace `<username>` with your HCC username. + +Sample Output +------------- + +The sum from 1 to 20 is computed and printed to the `.out` file (see +below). The outputs from the 5 cores are collected and processed by the +master core (i.e. `myid=0`). + +{{%expand "Fortran.out" %}} +{{< highlight batchfile>}} + ----------------------------------------- + *Final output from... myid= 0 + numnodes = 5 + mpi_sum = 210.00000000000000 + y=... + 1.0000000000000000 + 2.0000000000000000 + 3.0000000000000000 + 4.0000000000000000 + 5.0000000000000000 + 6.0000000000000000 + 7.0000000000000000 + 8.0000000000000000 + 9.0000000000000000 + 10.000000000000000 + 11.000000000000000 + 12.000000000000000 + 13.000000000000000 + 14.000000000000000 + 15.000000000000000 + 16.000000000000000 + 17.000000000000000 + 18.000000000000000 + 19.000000000000000 + 20.000000000000000 + sum(y)= 210.00000000000000 +{{< /highlight >}} +{{% /expand %}} + +{{%expand "C.out" %}} +{{< highlight batchfile>}} +----------------------------------- +*Final output from... myid= 0 +numnodes = 5 +mpi_sum = 210.000000 +y=... +1.000000 +2.000000 +3.000000 +4.000000 +5.000000 +6.000000 +7.000000 +8.000000 +9.000000 +10.000000 +11.000000 +12.000000 +13.000000 +14.000000 +15.000000 +16.000000 +17.000000 +18.000000 +19.000000 +20.000000 +sum(y) = 210.000000 +{{< /highlight >}} +{{% /expand %}} + diff --git a/content/handling_data/data_storage/linux_file_permissions.md b/content/handling_data/data_storage/linux_file_permissions.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..9a3d709945b45a42efe74850471671a34e7252e8 100644 --- a/content/handling_data/data_storage/linux_file_permissions.md +++ b/content/handling_data/data_storage/linux_file_permissions.md @@ -0,0 +1,48 @@ ++++ +title = "Linux File Permissions" +description = "How to view and change file permissions with Linux commands" +weight = 20 ++++ + +- [Opening a Terminal Window] (#opening-a-terminal-window) +- [Listing File Permissions] (#listing-file-permissions) +- [Changing File Permissions] (#changing-file-permissions) + +## Opening a Terminal Window +----------------------- + +Use your local terminal to connect to a cluster, or open a new terminal window on [Crane](https://crane.unl.edu). + +Click [here](https://hcc.unl.edu/docs/Quickstarts/connecting/) if you need help connecting to a cluster +with a local terminal. + +Click [here](https://hcc.unl.edu/docs/guides/running_applications/jupyter/) if you need +help opening a new terminal window within JupyterHub. + +## Listing File Permissions + +Type the command `ls -l` to list the files and directories with file permissions for your current location. + +{{< figure src="/images/LinuxList.png" >}} + +The first character denotes whether an item is a file or a directory. If 'd' is shown, it's a directory, and if '-' is shown, it's a file. + Following the first character you will see some +combination of r,w,x, and -. The first rwx is the ‘read’ ‘write’ ‘execute’ file permissions for the creator + of that file or directory. A ‘-‘ instead means a particular permission has not been granted. For example “rw-“ means the + ‘execute’ permission has not been granted. The next three entries are the permissions for ‘group’ and the last three are the + permissions for everyone else. + + Following the file permissions are the name of the creator, the name of the group, the size of the file, the date it was created, and finally +the name of the file. + + +## Changing File Permissions + +To change file permissions, use the command "chmod [permissions] [filename]" where permissions are indicated by a three-digit code. +Each digit in the code correspondes to the three digits mentioned above in the permissions printout: One for the creater permissions, +one for the group permissions, and one for everyone else. The command is interpreted as follows: 4=read 2=write 1=execute and any combination of these is given by summing their codes. +Each chmod command will include 3 codes. +For example, to give the creator of mars.txt rights to read, write and execute, the group rights to read and execute, and everone else only the right to read, +we would use the command `chmod 754 mars.txt` + +{{< figure src="/images/LinuxChange.png" >}} diff --git a/content/handling_data/data_storage/preventing_file_loss.md b/content/handling_data/data_storage/preventing_file_loss.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..1ce628aad84316e96cce4cfe7818e91e050902d7 100644 --- a/content/handling_data/data_storage/preventing_file_loss.md +++ b/content/handling_data/data_storage/preventing_file_loss.md @@ -0,0 +1,169 @@ ++++ +title = "Preventing File Loss" +description = "How to prevent file loss on HCC clusters" +weight = 40 ++++ + +Each research group is allocated 50TB of storage in `/work` on HCC +clusters. With over 400 active groups, HCC does not have the resources +to provide regular backups of `/work` without sacrificing the +performance of the existing filesystem. No matter how careful a user +might be, there is always the risk of file loss due to user error, +natural disasters, or equipment failure. + +However, there are a number of solutions available for backing up your +data. By carefully considering the benefits and limitations of each, +users can select the backup methods that work best for their particular +needs. For truly robust file backups, we recommend combining multiple +methods. For example, use Git regularly along with manual backups to an +external hard-drive at regular intervals such as monthly or biannually. + +--- +### 1. Use your local machine: + +If you have sufficient hard drive space, regularly backup your `/work` +directories to your personal computer. To avoid filling up your personal +hard-drives, consider using an external drive that can easily be placed +in a fireproof safe or at an off-site location for an extra level of +protection. To do this, you can either use [Globus +Connect]({{< relref "/handling_data/data_transfer/globus_connect/_index.md" >}}) or an +SCP client, such +as <a href="https://cyberduck.io/" class="external-link">Cyberduck</a> or <a href="https://winscp.net/eng/index.php" class="external-link">WinSCP</a>. +For help setting up an SCP client, check out our [Connecting Guides]({{< relref "/connecting" >}}). + +For those worried about personal hard drive crashes, UNL +offers <a href="http://nsave.unl.edu/" class="external-link">the backup service NSave</a>. +For a small monthly fee, users can install software that will +automatically backup selected files from their personal machine. + +Benefits: + +- Gives you full control over what is backed up and when. +- Doesn't require the use of third party servers (when using SCP + clients). +- Take advantage of our high speed data transfers (10 Gb/s) when using + Globus Connect or [setup your SCP client to use our dedicated high + speed transfer + servers]({{< relref "/handling_data/data_transfer/high_speed_data_transfers.md" >}}) + +Limitations: + +- The amount you can backup is limited by available hard-drive space. +- Manual backups of many files can be time consuming. + +--- +### 2. Use Git to preserve files and revision history: + +Git is a revision control service which can be run locally or can be +paired with a repository hosting service, such +as <a href="http://www.github.com/" class="external-link">GitHub</a>, to +provide a remote backup of your files. Git works best with smaller files +such as source code and manuscripts. Anyone with an InCommon login can +utilize <a href="http://git.unl.edu/" class="external-link">UNL's GitLab Instance</a>, +for free. + +Benefits: + +- Git is naturally collaboration-friendly, allowing multiple people to + easily work on the same project and provides great built-in tools to + control contributions and managing conflicting changes. +- Create individual repositories for each project, allowing you to + compartmentalize your work. +- Using UNL's GitLab instance allows you to create private or internal + (accessible by anyone within your organization) repositories. + +Limitations: + +- Git is not designed to handle large files. GitHub does not allow + files larger than 100MB unless using + their <a href="https://help.github.com/articles/about-git-large-file-storage/" class="external-link">Git Large File Storage</a> and + tracking files over 1GB in size can be time consuming and lead to + errors when using other repository hosts. + +--- +### 3. Use Attic: + +HCC offers +long-term, <a href="https://en.wikipedia.org/wiki/Nearline_storage" class="external-link">near-line</a> data +storage +through [Attic]({{< relref "using_attic" >}}). +HCC users with an existing account +can <a href="http://hcc.unl.edu/attic" class="external-link">apply for an Attic account</a> for +a <a href="http://hcc.unl.edu/priority-access-pricing" class="external-link">small annual fee</a> that +is substantially less than other cloud services. + +Benefits: + +- Attic files are backed up regularly at both HCC locations in Omaha + and Lincoln to help provide disaster tolerance and a second security + layer against file loss. +- No limits on individual or total file sizes. +- High speed data transfers between Attic and the clusters when using + [Globus Connect]({{< relref "/handling_data/data_transfer/globus_connect/_index.md" >}}) and [HCC's high-speed data + servers]({{< relref "/handling_data/data_transfer/high_speed_data_transfers.md" >}}). + +Limitations: + +- Backups must be done manually which can be time consuming. Setting + up automated scripts can help speed up this process. + +--- +### 4. Use a cloud-based service, such as Box: + +Many of us are familiar with services such as Google Drive, Dropbox, Box +and OneDrive. These cloud-based services provide a convenient portal for +accessing your files from any computer. NU offers OneDrive and Box +services to all students, staff and faculty. But did you know that you +can link your Box account to HCC’s clusters to provide quick and easy +access to files stored there? [Follow a few set-up +steps]({{< relref "integrating_box_with_hcc" >}}) and +you can add files to and access files stored in your Box account +directly from HCC clusters. Setup your submit scripts to automatically +upload results as they are generated or use it interactively to store +important workflow scripts and maintain a backup of your analysis +results. + +Benefits: + +- <a href="http://box.unl.edu/" class="external-link">Box@UNL</a> offers + unlimited file storage while you are associated with UNL. +- Integrating with HCC clusters provides a quick and easy way to + automate backups of analysis results and workflow scripts. + +Limitations: + +- Box has individual file size limitations, larger files will need to + be backed up using an alternate method. + +--- +### 5. Copy important files to `/home`: + +While `/work` files and directories are not backed up, files and +directories in `/home` are backed up on a daily basis. Due to the +limitations of the `/home` filesystem, we strongly recommend that only +source code and compiled programs are backed up to `/home`. If you do +use `/home` to backup datasets, please keep a working copy in your +`/work` directories to prevent negatively impacting the functionality of +the cluster. + +Benefits: + +- No need to make manual backups. `\home` files are automatically backed + up daily. +- Files in `/home` are not subject to the 6 month purge policy that + exists on `/work`. +- Doesn't require the use of third-party software or tools. + +Limitations: + +- Home storage is limited to 20GB per user. Larger files sets will + need to be backed up using an alternate method. +- Home is read-only on the cluster worker nodes so results cannot be + directly written or altered from within a submitted job. + + +If you would like more information or assistance in setting up any of +these methods, contact us +at <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a>. + + diff --git a/content/handling_data/data_storage/using_attic.md b/content/handling_data/data_storage/using_attic.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..cdc6c1f4dad5dfb6b6af575a9fa1a2c809bc627f 100644 --- a/content/handling_data/data_storage/using_attic.md +++ b/content/handling_data/data_storage/using_attic.md @@ -0,0 +1,105 @@ ++++ +title = "Using Attic" +description = "How to store data on Attic" +weight = 20 ++++ + +For users who need long-term storage for large amount of data, HCC +provides an economical solution called Attic. Attic is a reliable +<a href="https://en.wikipedia.org/wiki/Nearline_storage" class="external-link">near-line data archive</a> storage +system. The files in Attic can be accessed and shared from anywhere +using [Globus +Connect]({{< relref "/handling_data/data_transfer/globus_connect" >}}), +with a fast 10Gb/s link. Also, the data in Attic is backed up between +our Lincoln and Omaha facilities to ensure high availability and +disaster tolerance. The data and user activities on Attic are subject to +our +<a href="http://hcc.unl.edu/hcc-policies" class="external-link">HCC Policies</a>. + +--- +### Accounts and Cost + +To use Attic you will first need an +<a href="https://hcc.unl.edu/new-user-request" class="external-link">HCC account</a>, and +then you may request an +<a href="http://hcc.unl.edu/attic" class="external-link">Attic allocation</a>. +We charge a small fee per TB per year, but it is cheaper than most +commercial cloud storage solutions. For the user application form and +cost, please see the +<a href="http://hcc.unl.edu/attic" class="external-link">HCC Attic page</a>. + +--- +### Transfer Files Using Globus Connect + +The easiest and fastest way to access Attic is via Globus. You can +transfer files between your computer, our clusters ($HOME, $WORK, and $COMMON on +Crane or Rhino), and Attic. Here is a detailed tutorial on +how to set up and use [Globus Connect]({{< relref "/handling_data/data_transfer/globus_connect" >}}). For +Attic, use the Globus Endpoint **hcc\#attic**. Your Attic files are +located at `~, `which is a shortcut +for `/attic/<groupname>/<username>`. +**Note:** *If you are accessing Attic files from your supplementary +group, you should explicitly set the path to +/attic/<supplementary\_groupname>/. If you don't do that, by +default the endpoint will try to place you in your primary group's Attic +path, to which access will be denied if the primary group doesn't have an Attic allocation.* + +--- +### Transfer Files Using SCP/SFTP/RSYNC + +The transfer server for Attic storage is `attic.unl.edu` (or `attic-xfer.unl.edu`). + +{{% panel theme="info" header="SCP Example" %}} +{{< highlight bash >}} +scp /source/file <username>@attic.unl.edu:~/destination/file +{{< /highlight >}} +{{% /panel %}} + +{{% panel theme="info" header="SFTP Example" %}} +{{< highlight bash >}} +sftp <username>@attic.unl.edu +Password: +Duo two-factor login for <username> +Connected to attic.unl.edu. +sftp> pwd +Remote working directory: /attic/<groupname>/<username> +sftp> put source/file destination/file +sftp> exit +{{< /highlight >}} +{{% /panel %}} + +{{% panel theme="info" header="RSYNC Example" %}} +{{< highlight bash >}} +# local to remote rsync command +rsync -avz /local/source/path <username>@attic.unl.edu:remote/destination/path + +# remote to local rsync command +rsync -avz <username>@attic.unl.edu:remote/source/path /local/destination/path +{{< /highlight >}} +{{% /panel %}} + +You can also access your data on Attic using our [high-speed +transfer servers]({{< relref "/handling_data/data_transfer/high_speed_data_transfers" >}}) if you prefer. +Simply use scp or sftp to connect to one of the transfer servers, and +your directory is mounted at `/attic/<groupname>/<username>`. + +--- +### Check Attic Usage + +The usage and quota information for your group and the users in the +group are stored in a file named "disk\_usage.txt" in your group's +directory (`/attic/<groupname>`). You can use either [Globus Connect]({{< relref "/handling_data/data_transfer/globus_connect" >}}) or +scp to download it. Your usage and expiration is also shown in the web +interface (see below). + +--- +### Use the web interface + +For convenience, a web interface is also provided. Simply go to +<a href="https://attic.unl.edu" class="external-link">https://attic.unl.edu</a> +and login with your HCC credentials. Using this interface, you can see +your quota usage and expiration, manage files, etc. **Please note we do +not recommend uploading/downloading large files this way**. Use one of +the other transfer methods above for large datasets. + + diff --git a/content/handling_data/data_transfer/globus_connect/activating_hcc_cluster_endpoints.md b/content/handling_data/data_transfer/globus_connect/activating_hcc_cluster_endpoints.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..a4b1a339824ae92bc6410c506c86ed60fe8b4968 100644 --- a/content/handling_data/data_transfer/globus_connect/activating_hcc_cluster_endpoints.md +++ b/content/handling_data/data_transfer/globus_connect/activating_hcc_cluster_endpoints.md @@ -0,0 +1,39 @@ ++++ +title = "Activating HCC Cluster Endpoints" +description = "How to activate HCC endpoints on Globus" +weight = 20 ++++ + +You will not be able to transfer files to or from an HCC endpoint using Globus Connect without first activating the endpoint. Endpoints are available for Crane (`hcc#crane`), Rhino, (`hcc#rhino`), and Attic (`hcc#attic`). Follow the instructions below to activate any of these endpoints and begin making transfers. + +1. [Sign in](https://www.globus.org/SignIn) to your Globus account using your campus credentials or your Globus ID (if you have one). Then click on 'Endpoints' in the left sidebar. +{{< figure src="/images/Glogin.png" >}} +{{< figure src="/images/endpoints.png" >}} + +2. Find the endpoint you want by entering '`hcc#crane`', '`hcc#rhino`', or '`hcc#attic`' in the search box and hit 'enter'. Once you have found and selected the endpoint, click the green 'activate' icon. On the following page, click 'continue'. +{{< figure src="/images/activateEndpoint.png" >}} +{{< figure src="/images/EndpointContinue.png" >}} + +3. You will be redirected to the HCC Globus Endpoint Activation page. Enter your *HCC* username and password (the password you usually use to log into the HCC clusters). +{{< figure src="/images/hccEndpoint.png" >}} + +4. Next you will be prompted to + provide your *Duo* credentials. If you use the Duo Mobile app on + your smartphone or tablet, select 'Duo Push'. Once you approve the notification that is sent to your phone, + the activation will be complete. If you use a Yubikey for + authentication, select the 'Passcode' option and then press your + Yubikey to complete the activation. Upon successful activation, you + will be redirected to your Globus *Manage Endpoints* page. +{{< figure src="/images/EndpointPush.png" >}} +{{< figure src="/images/endpointComplete.png" >}} + +The endpoint should now be ready +and will not have to be activated again for the next 7 days. +To transfer files between any two HCC clusters, you will need to +activate both endpoints individually. + +Next, learn how to [make file transfers between HCC endpoints]({{< relref "/handling_data/data_transfer/globus_connect/file_transfers_between_endpoints" >}}) or how to [transfer between HCC endpoints and a personal computer]({{< relref "/handling_data/data_transfer/globus_connect/file_transfers_to_and_from_personal_workstations" >}}). + +--- + + diff --git a/content/handling_data/data_transfer/globus_connect/file_transfers_between_endpoints.md b/content/handling_data/data_transfer/globus_connect/file_transfers_between_endpoints.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..4379b43de6922a206b9b0f9c0c9522729e5dc7ba 100644 --- a/content/handling_data/data_transfer/globus_connect/file_transfers_between_endpoints.md +++ b/content/handling_data/data_transfer/globus_connect/file_transfers_between_endpoints.md @@ -0,0 +1,50 @@ ++++ +title = "File Transfers Between Endpoints" +description = "How to transfer files between HCC clusters using Globus" +weight = 30 ++++ + +To transfer files between HCC clusters, you will first need to +[activate]({{< relref "/handling_data/data_transfer/globus_connect/activating_hcc_cluster_endpoints" >}}) the +two endpoints you would like to use (the available endpoints +are: `hcc#crane` `hcc#rhino`, and `hcc#attic`). Once +that has been completed, follow the steps below to begin transferring +files. (Note: You can also transfer files between an HCC endpoint and +any other Globus endpoint for which you have authorized access. That +may include a [personal +endpoint]({{< relref "/handling_data/data_transfer/globus_connect/file_transfers_to_and_from_personal_workstations" >}}), +a [shared +endpoint]({{< relref "/handling_data/data_transfer/globus_connect/file_sharing" >}}), +or an endpoint on another computing resource or cluster. Once the +endpoints have been activated, the file transfer process is generally +the same regardless of the type of endpoints you use. For demonstration +purposes we use two HCC endpoints.) + +1. Once both endpoints for the desired file transfer have been + activated, [sign in](https://www.globus.org/SignIn) to + your Globus account (if you are not already) and select + "Transfer or Sync to.." from the right sidebar. If you have + a small screen, you may have to click the menu icon + first. +{{< figure src="/images/Transfer.png">}} + +2. Enter the names of the two endpoints you would like to use, or + select from the drop-down menus (for + example, `hcc#attic` and `hcc#crane`). Enter the + directory paths for both the source and destination (the 'from' and + 'to' paths on the respective endpoints). Press 'Enter' to view files + under these directories. Select the files or directories you would + like to transfer (press *shift* or *control* to make multiple + selections) and click the blue highlighted arrow to start the + transfer. +{{< figure src="/images/startTransfer.png" >}} + +3. Globus will display a message when your transfer has completed + (or in the unlikely event that it was unsuccessful), and you will + also receive an email. Select the 'refresh' icon to see your file + in the destination folder. +{{< figure src="/images/transferComplete.png" >}} + +--- + + diff --git a/content/handling_data/data_transfer/high_speed_data_transfers.md b/content/handling_data/data_transfer/high_speed_data_transfers.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..922eb4b43902030f7a09eeb1f96332fea2bbe294 100644 --- a/content/handling_data/data_transfer/high_speed_data_transfers.md +++ b/content/handling_data/data_transfer/high_speed_data_transfers.md @@ -0,0 +1,28 @@ ++++ +title = "High Speed Data Transfers" +description = "How to transfer files directly from the transfer servers" +weight = 10 ++++ + +Crane, Rhino, and Attic each have a dedicated transfer server with +10 Gb/s connectivity that allows +for faster data transfers than the login nodes. With [Globus +Connect]({{< relref "globus_connect" >}}), users +can take advantage of this connection speed when making large/cumbersome +transfers. + +Those who prefer scp, sftp or +rsync clients can also benefit from this high-speed connectivity by +using these dedicated servers for data transfers: + +Cluster | Transfer server +----------|---------------------- +Crane | `crane-xfer.unl.edu` +Rhino | `rhino-xfer.unl.edu` +Attic | `attic-xfer.unl.edu` + +{{% notice info %}} +Because the transfer servers are login-disabled, third-party transfers +between `crane-xfer`, and `attic-xfer` must be done via [Globus Connect]({{< relref "globus_connect" >}}). +{{% /notice %}} + diff --git a/content/submitting_jobs/app_specific/submitting_an_openmp_job.md b/content/submitting_jobs/app_specific/submitting_an_openmp_job.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..a49e62292515d1763d8542414069240a8ff984c3 100644 --- a/content/submitting_jobs/app_specific/submitting_an_openmp_job.md +++ b/content/submitting_jobs/app_specific/submitting_an_openmp_job.md @@ -0,0 +1,42 @@ ++++ +title = "Submitting an OpenMP Job" +description = "How to submit an OpenMP job on HCC resources." ++++ + +Submitting an OpenMP job is different from +[Submitting an MPI Job]({{< relref "submitting_an_mpi_job" >}}) +since you must request multiple cores from a single node. + +{{% panel theme="info" header="OpenMP example submission" %}} +{{< highlight batch >}} +#!/bin/sh +#SBATCH --ntasks-per-node=16 # 16 cores +#SBATCH --nodes=1 # 1 node +#SBATCH --mem-per-cpu=1024 # Minimum memory required per CPU (in megabytes) +#SBATCH --time=03:15:00 # Run time in hh:mm:ss +#SBATCH --error=/work/[groupname]/[username]/job.%J.err +#SBATCH --output=/work/[groupname]/[username]/job.%J.out + +export OMP_NUM_THREADS=${SLURM_NTASKS_PER_NODE} +./openmp-app.exe +{{< /highlight >}} +{{% /panel %}} + +Notice that we used `ntasks-per-node` to specify the number of cores we +want on a single node. Additionally, we specify that we only want +1 `node`. + +`OMP_NUM_THREADS` is required to limit the number of cores that OpenMP +will use on the node. It is set to ${SLURM_NTASKS_PER_NODE} to +automatically match the `ntasks-per-node` value (in this example 16). + +### Compiling + +Directions to compile OpenMP can be found on +[Compiling an OpenMP Application] +({{< relref "/applications/user_software/compiling_an_openmp_application" >}}). + +### Further Documentation + +Further OpenMP documentation can be found on LLNL's +[OpenMP](https://computing.llnl.gov/tutorials/openMP) website.