Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
  • FAQ
  • RDPv10
  • UNL_OneDrive
  • atticguidelines
  • data_share
  • globus-auto-backups
  • good-hcc-practice-rep-workflow
  • hchen2016-faq-home-is-full
  • ipynb-doc
  • master
  • rclone-fix
  • sislam2-master-patch-51693
  • sislam2-master-patch-86974
  • site_url
  • test
15 results

Target

Select target project
  • dweitzel2/hcc-docs
  • OMCCLUNG2/hcc-docs
  • salmandjing/hcc-docs
  • hcc/hcc-docs
4 results
Select Git revision
  • 26-add-screenshots-for-newer-rdp-v10-client
  • 28-overview-page-for-connecting-2
  • RDPv10
  • gpu_update
  • master
  • overview-page-for-handling-data
  • patch-1
  • patch-10
  • patch-11
  • patch-12
  • patch-2
  • patch-3
  • patch-4
  • patch-5
  • patch-6
  • patch-7
  • patch-8
  • patch-9
  • runTime
  • submitting-jobs-overview
20 results
Show changes
Showing
with 1552 additions and 463 deletions
+++ ---
title = "Submitting MATLAB Jobs" title: Submitting MATLAB Jobs
description = "How to submit MATLAB jobs on HCC resources." summary: "How to submit MATLAB jobs on HCC resources."
+++ ---
Submitting Matlab jobs is very similar to Submitting Matlab jobs is very similar to
[submitting MPI jobs]({{< relref "submitting_an_mpi_job" >}}) or [submitting MPI jobs](../../submitting_an_mpi_job/) or
[serial jobs]({{< relref "/guides/submitting_jobs/_index.md" >}}) [serial jobs](/submitting_jobs/)
(depending if you are using parallela matlab). (depending if you are using parallela matlab).
### Submit File ### Submit File
...@@ -13,9 +13,9 @@ Submitting Matlab jobs is very similar to ...@@ -13,9 +13,9 @@ Submitting Matlab jobs is very similar to
The submit file will need to be modified to allow Matlab to work. The submit file will need to be modified to allow Matlab to work.
Specifically, these two lines should be added before calling matlab: Specifically, these two lines should be added before calling matlab:
{{% panel theme="info" header="serial_matlab.submit" %}} !!! note "serial_matlab.submit"
{{< highlight batch >}} ```bat
#!/bin/sh #!/bin/bash
#SBATCH --time=03:15:00 #SBATCH --time=03:15:00
#SBATCH --mem-per-cpu=1024 #SBATCH --mem-per-cpu=1024
#SBATCH --job-name=[job_name] #SBATCH --job-name=[job_name]
...@@ -24,16 +24,16 @@ Specifically, these two lines should be added before calling matlab: ...@@ -24,16 +24,16 @@ Specifically, these two lines should be added before calling matlab:
module load matlab/r2014b module load matlab/r2014b
matlab -nodisplay -r "[matlab script name], quit" matlab -nodisplay -r "[matlab script name], quit"
{{< /highlight >}} ```
{{% /panel %}}
### Parallel Matlab .m file ### Parallel Matlab .m file
The submit file: The submit file:
{{% panel theme="info" header="parallel_matlab.submit" %}} !!! note "parallel_matlab.submit"
{{< highlight batch >}} ```bat
#!/bin/sh #!/bin/bash
#SBATCH --nodes=1 #SBATCH --nodes=1
#SBATCH --ntasks-per-node=5 #SBATCH --ntasks-per-node=5
#SBATCH --time=03:15:00 #SBATCH --time=03:15:00
...@@ -44,8 +44,8 @@ The submit file: ...@@ -44,8 +44,8 @@ The submit file:
module load matlab/r2014b module load matlab/r2014b
matlab -nodisplay -r "[matlab script name], quit" matlab -nodisplay -r "[matlab script name], quit"
{{< /highlight >}} ```
{{% /panel %}}
#### Matlab File Additions #### Matlab File Additions
...@@ -53,9 +53,9 @@ In addition to the changes in the submit file, if you are running ...@@ -53,9 +53,9 @@ In addition to the changes in the submit file, if you are running
parallel Matlab, you will also need to add to the .m file the additional parallel Matlab, you will also need to add to the .m file the additional
lines: lines:
{{< highlight batch >}} ```matlab
... ...
i=str2num(getenv('SLURM_TASKS_PER_NODE')); i=str2num(getenv('SLURM_TASKS_PER_NODE'));
parpool(i); parpool(i);
... ...
{{< /highlight >}} ```
+++ ---
title = "Submitting R Jobs" title: Submitting R Jobs
description = "How to submit R jobs on HCC resources." summary: "How to submit R jobs on HCC resources."
+++ ---
Submitting an R job is very similar to submitting a serial job shown Submitting an R job is very similar to submitting a serial job shown
on [Submitting Jobs]({{< relref "../submitting_jobs/_index.md" >}}). on [Submitting Jobs](/submitting_jobs/).
- [Running R scripts in batch](#running-r-scripts-in-batch) - [Running R scripts in batch](#running-r-scripts-in-batch)
- [Running R scripts using `R CMD BATCH`](#running-r-scripts-using-r-cmd-batch) - [Running R scripts using `R CMD BATCH`](#running-r-scripts-using-r-cmd-batch)
...@@ -25,17 +25,17 @@ When utilizing `R CMD BATCH` all output will be directed to an `.Rout` ...@@ -25,17 +25,17 @@ When utilizing `R CMD BATCH` all output will be directed to an `.Rout`
file named after your script unless otherwise specified. For file named after your script unless otherwise specified. For
example: example:
{{% panel theme="info" header="serial_R.submit" %}} !!! note "serial_R.submit"
{{< highlight batch >}} ```bat
#!/bin/sh #!/bin/bash
#SBATCH --time=00:30:00 #SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024 #SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob #SBATCH --job-name=TestJob
module load R/3.5 module load R/3.5
R CMD BATCH Rcode.R R CMD BATCH Rcode.R
{{< /highlight >}} ```
{{% /panel %}}
In the above example, output for the job will be found in the file In the above example, output for the job will be found in the file
`Rcode.Rout`. Notice that we did not specify output and error files in `Rcode.Rout`. Notice that we did not specify output and error files in
...@@ -45,17 +45,17 @@ the `.Rout` file. To direct output to a specific location, follow your ...@@ -45,17 +45,17 @@ the `.Rout` file. To direct output to a specific location, follow your
directed to, as follows: directed to, as follows:
{{% panel theme="info" header="serial_R.submit" %}} !!! note "serial_R.submit"
{{< highlight batch >}} ```bat
#!/bin/sh #!/bin/bash
#SBATCH --time=00:30:00 #SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024 #SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob #SBATCH --job-name=TestJob
module load R/3.5 module load R/3.5
R CMD BATCH Rcode.R Rcodeoutput.txt R CMD BATCH Rcode.R Rcodeoutput.txt
{{< /highlight >}} ```
{{% /panel %}}
In this example, output from running the script `Rcode.R` will be placed In this example, output from running the script `Rcode.R` will be placed
in the file `Rcodeoutput.txt`. in the file `Rcodeoutput.txt`.
...@@ -64,17 +64,17 @@ To pass arguments to the script, they need to be specified after `R CMD ...@@ -64,17 +64,17 @@ To pass arguments to the script, they need to be specified after `R CMD
BATCH` but before the script to be executed, and preferably preceded BATCH` but before the script to be executed, and preferably preceded
with `--args` as follows: with `--args` as follows:
{{% panel theme="info" header="serial_R.submit" %}} !!! note "serial_R.submit"
{{< highlight batch >}} ```bat
#!/bin/sh #!/bin/bash
#SBATCH --time=00:30:00 #SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024 #SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob #SBATCH --job-name=TestJob
module load R/3.5 module load R/3.5
R CMD BATCH "--args argument1 argument2 argument3" Rcode.R Rcodeoutput.txt R CMD BATCH "--args argument1 argument2 argument3" Rcode.R Rcodeoutput.txt
{{< /highlight >}} ```
{{% /panel %}}
#### Running R scripts using `Rscript` #### Running R scripts using `Rscript`
...@@ -85,9 +85,9 @@ in a manner similar to other programs. This gives the user larger ...@@ -85,9 +85,9 @@ in a manner similar to other programs. This gives the user larger
control over where to direct the output. For example, to run our script control over where to direct the output. For example, to run our script
using `Rscript` the submit script could look like the following: using `Rscript` the submit script could look like the following:
{{% panel theme="info" header="serial_R.submit" %}} !!! note "serial_R.submit"
{{< highlight batch >}} ```bat
#!/bin/sh #!/bin/bash
#SBATCH --time=00:30:00 #SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024 #SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob #SBATCH --job-name=TestJob
...@@ -96,22 +96,22 @@ using `Rscript` the submit script could look like the following: ...@@ -96,22 +96,22 @@ using `Rscript` the submit script could look like the following:
module load R/3.5 module load R/3.5
Rscript Rcode.R Rscript Rcode.R
{{< /highlight >}} ```
{{% /panel %}}
In the above example, STDOUT will be directed to the output file In the above example, STDOUT will be directed to the output file
`TestJob.%J.stdout` and STDERR directed to `TestJob.%J.stderr`. You `TestJob.%J.stdout` and STDERR directed to `TestJob.%J.stderr`. You
will notice that the example is very similar to to the will notice that the example is very similar to to the
[serial example]({{< relref "/guides/submitting_jobs/_index.md" >}}). [serial example](/submitting_jobs/).
The important line is the `module load` command. The important line is the `module load` command.
That tells the cluster to load the R framework into the environment so jobs may use it. That tells the cluster to load the R framework into the environment so jobs may use it.
To pass arguments to the script when using `Rscript`, the arguments To pass arguments to the script when using `Rscript`, the arguments
will follow the script name as in the example below: will follow the script name as in the example below:
{{% panel theme="info" header="serial_R.submit" %}} !!! note "serial_R.submit"
{{< highlight batch >}} ```bat
#!/bin/sh #!/bin/bash
#SBATCH --time=00:30:00 #SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024 #SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob #SBATCH --job-name=TestJob
...@@ -120,20 +120,20 @@ will follow the script name as in the example below: ...@@ -120,20 +120,20 @@ will follow the script name as in the example below:
module load R/3.5 module load R/3.5
Rscript Rcode.R argument1 argument2 argument3 Rscript Rcode.R argument1 argument2 argument3
{{< /highlight >}} ```
{{% /panel %}}
--- ---
### Multicore (parallel) R submission ### Multicore (parallel) R submission
Submitting a multicore R job to SLURM is very similar to Submitting a multicore R job to SLURM is very similar to
[Submitting an OpenMP Job]({{< relref "submitting_an_openmp_job" >}}), [Submitting an OpenMP Job](../../submitting_an_openmp_job/),
since both are running multicore jobs on a single node. Below is an example: since both are running multicore jobs on a single node. Below is an example:
{{% panel theme="info" header="parallel_R.submit" %}} !!! note "parallel_R.submit"
{{< highlight batch >}} ```bash
#!/bin/sh #!/bin/bash
#SBATCH --ntasks-per-node=16 #SBATCH --ntasks-per-node=16
#SBATCH --nodes=1 #SBATCH --nodes=1
#SBATCH --time=00:30:00 #SBATCH --time=00:30:00
...@@ -144,8 +144,8 @@ since both are running multicore jobs on a single node. Below is an example: ...@@ -144,8 +144,8 @@ since both are running multicore jobs on a single node. Below is an example:
module load R/3.5 module load R/3.5
R CMD BATCH Rcode.R R CMD BATCH Rcode.R
{{< /highlight >}} ```
{{% /panel %}}
The above example will submit a single job which can use up to 16 cores. The above example will submit a single job which can use up to 16 cores.
...@@ -154,26 +154,26 @@ performance will suffer.  For example, when using the ...@@ -154,26 +154,26 @@ performance will suffer.  For example, when using the
[parallel](http://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf) [parallel](http://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf)
package function mclapply: package function mclapply:
{{% panel theme="info" header="parallel.R" %}} !!! note "parallel.R"
{{< highlight R >}} ``` R
library("parallel") library("parallel")
... ...
mclapply(rep(4, 5), rnorm, mc.cores=16) mclapply(rep(4, 5), rnorm, mc.cores=16)
{{< /highlight >}} ```
{{% /panel %}}
--- ---
### Multinode R submission with Rmpi ### Multinode R submission with Rmpi
Submitting a multinode MPI R job to SLURM is very similar to Submitting a multinode MPI R job to SLURM is very similar to
[Submitting an MPI Job]({{< relref "submitting_an_mpi_job" >}}), [Submitting an MPI Job](../../submitting_an_mpi_job/),
since both are running multicore jobs on a multiple nodes. since both are running multicore jobs on a multiple nodes.
Below is an example of running Rmpi on Crane on 2 nodes and 32 cores: Below is an example of running Rmpi on Swan on 2 nodes and 32 cores:
{{% panel theme="info" header="Rmpi.submit" %}} !!! note "Rmpi.submit"
{{< highlight batch >}} ```bat
#!/bin/sh #!/bin/bash
#SBATCH --nodes=2 #SBATCH --nodes=2
#SBATCH --ntasks-per-node=16 #SBATCH --ntasks-per-node=16
#SBATCH --time=00:30:00 #SBATCH --time=00:30:00
...@@ -185,10 +185,10 @@ Below is an example of running Rmpi on Crane on 2 nodes and 32 cores: ...@@ -185,10 +185,10 @@ Below is an example of running Rmpi on Crane on 2 nodes and 32 cores:
module load compiler/gcc/4.9 openmpi/1.10 R/3.5 module load compiler/gcc/4.9 openmpi/1.10 R/3.5
export OMPI_MCA_mtl=^psm export OMPI_MCA_mtl=^psm
mpirun -n 1 R CMD BATCH Rmpi.R mpirun -n 1 R CMD BATCH Rmpi.R
{{< /highlight >}} ```
{{% /panel %}}
When you run Rmpi job on Crane, please use the line `export
When you run Rmpi job on Swan, please use the line `export
OMPI_MCA_mtl=^psm` in your submit script. Regardless of how may cores your job uses, the Rmpi package should OMPI_MCA_mtl=^psm` in your submit script. Regardless of how may cores your job uses, the Rmpi package should
always be run with `mpirun -n 1` because it spawns additional always be run with `mpirun -n 1` because it spawns additional
processes dynamically. processes dynamically.
...@@ -196,8 +196,8 @@ processes dynamically. ...@@ -196,8 +196,8 @@ processes dynamically.
Please find below an example of Rmpi R script provided by Please find below an example of Rmpi R script provided by
[The University of Chicago Research Computing Center](https://rcc.uchicago.edu/docs/software/environments/R/index.html#rmpi): [The University of Chicago Research Computing Center](https://rcc.uchicago.edu/docs/software/environments/R/index.html#rmpi):
{{% panel theme="info" header="Rmpi.R" %}} !!! note "Rmpi.R"
{{< highlight R >}} ``` R
library(Rmpi) library(Rmpi)
# initialize an Rmpi environment # initialize an Rmpi environment
...@@ -215,8 +215,8 @@ mpi.remote.exec(paste("I am", id, "of", ns, "running on", host)) ...@@ -215,8 +215,8 @@ mpi.remote.exec(paste("I am", id, "of", ns, "running on", host))
# close down the Rmpi environment # close down the Rmpi environment
mpi.close.Rslaves(dellog = FALSE) mpi.close.Rslaves(dellog = FALSE)
mpi.exit() mpi.exit()
{{< /highlight >}} ```
{{% /panel %}}
--- ---
+++ ---
title = "Creating an Interactive Job" title: Creating an Interactive Job
description = "How to run an interactive job on HCC resources." summary: "How to run an interactive job on HCC resources."
weight=20 weight: 20
+++ ---
{{% notice info %}} !!! note
The `/home` directories are read-only on the worker nodes. You will need The `/home` directories are not intended for active job I/O.
to compile or run your processing in `/work`.
{{% /notice %}} Output from run your processing should be directed to either `/work` or `/common`.
Submitting an interactive job is done with the command `srun`.
{{< highlight bash >}} Submitting an interactive job is done with the command `srun`.
```bash
$ srun --pty $SHELL $ srun --pty $SHELL
{{< /highlight >}} ```
or to allocate 4 cores per node for 3 hours: This command will allocate the **default resources of 1GB of RAM, 1 hour of running time, and a single CPU core**. Oftentimes, these resources are not enough. If the job is terminated, there is a high chance that the reason is exceeded resources, so please make sure you set the memory and time requirements appropriately.
{{< highlight bash >}} Submitting an interactive job to allocate 4 CPU cores per node for 3 hours with RAM memory of 1GB per core on the general `batch` partition:
```bash
$ srun --nodes=1 --ntasks-per-node=4 --mem-per-cpu=1024 --time=3:00:00 --pty $SHELL $ srun --nodes=1 --ntasks-per-node=4 --mem-per-cpu=1024 --time=3:00:00 --pty $SHELL
{{< /highlight >}} ```
Submitting an interactive job is useful if you require extra resources Submitting an interactive job is useful if you require extra resources
to run some processing by hand. It is also very useful to debug your to run some processing by hand. It is also very useful to debug your
...@@ -30,6 +31,14 @@ job.  You can provide options to the interactive job just as you would a ...@@ -30,6 +31,14 @@ job.  You can provide options to the interactive job just as you would a
regular SLURM job. The default job runtime is 1 hour, and can be regular SLURM job. The default job runtime is 1 hour, and can be
increased by including the `--time` argument. increased by including the `--time` argument.
### Interactive job for Apptainer
Running Apptainer via an interactive job requires at least 4GBs of RAM:
```bash
$ srun --mem=4gb --nodes=1 --ntasks-per-node=4 --pty $SHELL
```
If you get any memory-related errors, continue to increase the requested memory amount.
### Priority for short jobs ### Priority for short jobs
To run short jobs for testing and development work, a job can specify a To run short jobs for testing and development work, a job can specify a
...@@ -41,14 +50,14 @@ priority so it will run as soon as possible. ...@@ -41,14 +50,14 @@ priority so it will run as soon as possible.
| `--qos=short` | | `--qos=short` |
{{% panel theme="warning" header="Limits per user for 'short' QoS" %}} !!! warning "Limits per user for 'short' QoS"
- 6 hour job run time - 6 hour job run time
- 2 jobs of 16 CPUs or fewer - 2 jobs of 16 CPUs or fewer
- No more than 256 CPUs in use for *short* jobs from all users - No more than 256 CPUs in use for *short* jobs from all users
{{% /panel %}}
{{% panel theme="info" header="Using the short QoS" %}}
{{< highlight bash >}} !!! note "Using the short QoS"
```bash
srun --qos=short --nodes=1 --ntasks-per-node=1 --mem-per-cpu=1024 --pty $SHELL srun --qos=short --nodes=1 --ntasks-per-node=1 --mem-per-cpu=1024 --pty $SHELL
{{< /highlight >}} ```
{{% /panel %}}
+++ ---
title = "HCC Acknowledgment Credit" title: HCC Acknowledgment Credit
description = "Details on the Acknowledgment Credit system." summary: "Details on the Acknowledgment Credit system."
weight=90 weight: 90
+++ ---
!!! note
To submit an acknowledgement and receive the credit, please use the form here: https://hcc.unl.edu/acknowledgement-submission.
!!! note
The following text provides a detailed description of how the Acknowledgment Credit works.
As a quickstart, add the line
`#SBATCH --qos=ac_<group>`
to your submit script, replacing `<group>` with your group name. Run the `hcc-ac` program to check the remaining balance.
{{% notice note %}}
To submit an acknowledgement and receive the credit, please use the form
here: https://hcc.unl.edu/acknowledgement-submission.
{{% /notice %}}
### What is HCC Acknowledgment Credit? ### What is HCC Acknowledgment Credit?
...@@ -46,7 +55,7 @@ exhausted. ...@@ -46,7 +55,7 @@ exhausted.
**Why this ratio?** **Why this ratio?**
All nodes in the Crane batch partition can meet this CPU to memory All nodes in the Swan batch partition can meet this CPU to memory
ratio. ratio.
**Why have this ratio?** **Why have this ratio?**
...@@ -73,47 +82,46 @@ Column description of the hcc-ac utility ...@@ -73,47 +82,46 @@ Column description of the hcc-ac utility
| per-CPU AvgMEM | The per-CPU average memory size available for the CPU time remaining in the qos. If CPU time is consumed faster than memory time, this value will increase. If memory time is consumed faster than CPU time, this value will decrease. | | per-CPU AvgMEM | The per-CPU average memory size available for the CPU time remaining in the qos. If CPU time is consumed faster than memory time, this value will increase. If memory time is consumed faster than CPU time, this value will decrease. |
##### Example of how to use the awarded time for the 'demo' group. ### Example of how to use the awarded time for the 'demo' group.
The awarded time is reduced down to 10 minutes to show consumption The awarded time is reduced down to 10 minutes to show consumption
changes with differing job resource requirements: changes with differing job resource requirements:
All times are in days-hours:minutes:seconds as used in Slurm's '--time=' All times are in days-hours:minutes:seconds as used in Slurm's '--time='
argument. argument.
{{% panel theme="info" header="Default output" %}} !!! note "Default output"
{{< highlight batch >}} ```bat
[demo01@login.hcc_cluster ~]$ hcc-ac [demo01@login.hcc_cluster ~]$ hcc-ac
+-----------+--------------+--------------+----------------+ +-----------+--------------+--------------+----------------+
| Slurm qos | CPUx1 time | MEMx4GB time | per-CPU AvgMEM | | Slurm qos | CPUx1 time | MEMx4GB time | per-CPU AvgMEM |
+-----------+--------------+--------------+----------------+ +-----------+--------------+--------------+----------------+
| ac_demo | 0-00:10:00 | 0-00:10:00 | 4.0GB | | ac_demo | 0-00:10:00 | 0-00:10:00 | 4.0GB |
+-----------+--------------+--------------+----------------+ +-----------+--------------+--------------+----------------+
{{< /highlight >}} ```
{{% /panel %}}
Use the Slurm quality of service argument '--qos' to gain access to the Use the Slurm quality of service argument '--qos' to gain access to the
awarded time with increased priority: awarded time with increased priority:
{{% panel theme="info" header="**--qos=ac_demo**" %}} !!! note "**--qos=ac_demo**"
{{< highlight batch >}} ```bat
[demo01@login.hcc_cluster ~]$ srun --qos=ac_demo --ntasks=1 --mem=8g --time=1:00 /bin/sleep 60 [demo01@login.hcc_cluster ~]$ srun --qos=ac_demo --ntasks=1 --mem=8g --time=1:00 /bin/sleep 60
{{< /highlight >}} ```
{{% /panel %}}
\*\***job runs for 60 seconds**\*\* \*\***job runs for 60 seconds**\*\*
{{% panel theme="info" header="**After 60 second job**" %}} !!! note "**After 60 second job**"
{{< highlight batch >}} ```bat
[demo01@login.hcc_cluster ~]$ hcc-ac [demo01@login.hcc_cluster ~]$ hcc-ac
+-----------+--------------+--------------+----------------+ +-----------+--------------+--------------+----------------+
| Slurm qos | CPUx1 time | MEMx4GB time | per-CPU AvgMEM | | Slurm qos | CPUx1 time | MEMx4GB time | per-CPU AvgMEM |
+-----------+--------------+--------------+----------------+ +-----------+--------------+--------------+----------------+
| ac_demo | 0-00:09:00 | 0-00:08:00 | 3.556GB | | ac_demo | 0-00:09:00 | 0-00:08:00 | 3.556GB |
+-----------+--------------+--------------+----------------+ +-----------+--------------+--------------+----------------+
{{< /highlight >}} ```
{{% /panel %}}
1 CPU minute and 2 4GB memory minutes were consumed by the prior srun 1 CPU minute and 2 4GB memory minutes were consumed by the prior srun
job. job.
...@@ -136,24 +144,24 @@ against 4GB: ...@@ -136,24 +144,24 @@ against 4GB:
ie, 9 \* 3.556 \~= 8 \* 4 ie, 9 \* 3.556 \~= 8 \* 4
{{% panel theme="info" header="**--ntasks=4**" %}} !!! note "**--ntasks=4**"
{{< highlight batch >}} ```bat
[demo01@login.hcc_cluster ~]$ srun --qos=ac_demo --ntasks=4 --mem-per-cpu=2G --time=1:00 /bin/sleep 60 [demo01@login.hcc_cluster ~]$ srun --qos=ac_demo --ntasks=4 --mem-per-cpu=2G --time=1:00 /bin/sleep 60
{{< /highlight >}} ```
{{% /panel %}}
\*\***job runs for 60 seconds**\*\* \*\***job runs for 60 seconds**\*\*
{{% panel theme="info" header="**After 60 second job**" %}} !!! note "**After 60 second job**"
{{< highlight batch >}} ```bat
[demo01@login.hcc_cluster ~]$ hcc-ac [demo01@login.hcc_cluster ~]$ hcc-ac
+-----------+--------------+--------------+----------------+ +-----------+--------------+--------------+----------------+
| Slurm qos | CPUx1 time | MEMx4GB time | per-CPU AvgMEM | | Slurm qos | CPUx1 time | MEMx4GB time | per-CPU AvgMEM |
+-----------+--------------+--------------+----------------+ +-----------+--------------+--------------+----------------+
| ac_demo | 0-00:05:00 | 0-00:06:00 | 4.8GB | | ac_demo | 0-00:05:00 | 0-00:06:00 | 4.8GB |
+-----------+--------------+--------------+----------------+ +-----------+--------------+--------------+----------------+
{{< /highlight >}} ```
{{% /panel %}}
4 CPU minutes and 2 4GB minutes were consumed by the prior srun job. 4 CPU minutes and 2 4GB minutes were consumed by the prior srun job.
...@@ -164,34 +172,34 @@ ie, 9 \* 3.556 \~= 8 \* 4 ...@@ -164,34 +172,34 @@ ie, 9 \* 3.556 \~= 8 \* 4
6 / 5 \* 4 == 4.8 6 / 5 \* 4 == 4.8
{{% panel theme="info" header="**Insufficient Time**" %}} !!! note "**Insufficient Time**"
{{< highlight batch >}} ```bat
[demo01@login.hcc_cluster ~]$ srun --qos=ac_demo --ntasks=5 --mem-per-cpu=5000M --time=1:00 /bin/sleep 60 [demo01@login.hcc_cluster ~]$ srun --qos=ac_demo --ntasks=5 --mem-per-cpu=5000M --time=1:00 /bin/sleep 60
srun: error: Unable to allocate resources: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits) srun: error: Unable to allocate resources: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
{{< /highlight >}} ```
{{% /panel %}}
An example of a job requesting more resources than what remains An example of a job requesting more resources than what remains
available in the qos. available in the qos.
{{% panel theme="info" header="**Corrected Memory Requirement**" %}} !!! note "**Corrected Memory Requirement**"
{{< highlight batch >}} ```bat
[demo01@login.hcc_cluster ~]$ srun --qos=ac_demo --ntasks=5 --mem-per-cpu=4800M --time=1:00 /bin/sleep 60 [demo01@login.hcc_cluster ~]$ srun --qos=ac_demo --ntasks=5 --mem-per-cpu=4800M --time=1:00 /bin/sleep 60
{{< /highlight >}} ```
{{% /panel %}}
\*\***job runs for 60 seconds**\*\* \*\***job runs for 60 seconds**\*\*
{{% panel theme="info" header="**Exhausted QoS**" %}} !!! note "**Exhausted QoS**"
{{< highlight batch >}} ```bat
[demo01@login.hcc_cluster ~]$ hcc-ac [demo01@login.hcc_cluster ~]$ hcc-ac
+-----------+--------------+--------------+----------------+ +-----------+--------------+--------------+----------------+
| Slurm qos | CPUx1 time | MEMx4GB time | per-CPU AvgMEM | | Slurm qos | CPUx1 time | MEMx4GB time | per-CPU AvgMEM |
+-----------+--------------+--------------+----------------+ +-----------+--------------+--------------+----------------+
| ac_demo | exhausted | exhausted | 0.0GB | | ac_demo | exhausted | exhausted | 0.0GB |
+-----------+--------------+--------------+----------------+ +-----------+--------------+--------------+----------------+
{{< /highlight >}} ```
{{% /panel %}}
All remaining time was used. Any further submissions to the qos will be All remaining time was used. Any further submissions to the qos will be
**denied at submission time**. **denied at submission time**.
...@@ -200,10 +208,10 @@ All of the above **srun** arguments work the same with **sbatch** within ...@@ -200,10 +208,10 @@ All of the above **srun** arguments work the same with **sbatch** within
the submit file header. the submit file header.
{{% panel theme="info" header="**Submit File Example**" %}} !!! note "**Submit File Example**"
{{< highlight batch >}} ```bat
[demo01@login.hcc_cluster ~]$ cat submit_test.slurm [demo01@login.hcc_cluster ~]$ cat submit_test.slurm
#!/bin/batch #!/bin/bash
#SBATCH --ntasks=4 #SBATCH --ntasks=4
#SBATCH --qos=ac_demo #SBATCH --qos=ac_demo
#SBATCH --ntasks=5 #SBATCH --ntasks=5
...@@ -212,8 +220,8 @@ the submit file header. ...@@ -212,8 +220,8 @@ the submit file header.
/bin/sleep 60 /bin/sleep 60
[demo01@login.hcc_cluster ~]$ sbatch ./submit_test.slurm [demo01@login.hcc_cluster ~]$ sbatch ./submit_test.slurm
{{< /highlight >}} ```
{{% /panel %}}
CPU and memory time in the qos are only consumed when jobs run against CPU and memory time in the qos are only consumed when jobs run against
the qos. Therefore it is possible for more jobs to be submitted the qos. Therefore it is possible for more jobs to be submitted
...@@ -235,8 +243,8 @@ argument to size the job against what time remains in the qos. ...@@ -235,8 +243,8 @@ argument to size the job against what time remains in the qos.
For example, with the same 10 minute limit: For example, with the same 10 minute limit:
{{% panel theme="info" header="**--test-only job to see if it fits within qos time limits**" %}} !!! note "**--test-only job to see if it fits within qos time limits**"
{{< highlight batch >}} ```bat
[demo01@login.hcc_cluster ~]$ hcc-ac [demo01@login.hcc_cluster ~]$ hcc-ac
+-----------+--------------+--------------+----------------+ +-----------+--------------+--------------+----------------+
...@@ -267,5 +275,5 @@ allocation failure: Job violates accounting/QOS policy (job submit limit, user's ...@@ -267,5 +275,5 @@ allocation failure: Job violates accounting/QOS policy (job submit limit, user's
[demo01@login.hcc_cluster ~]$ srun --test-only --qos=ac_demo --ntasks=1 --time=3:00 --mem-per-cpu=12G [demo01@login.hcc_cluster ~]$ srun --test-only --qos=ac_demo --ntasks=1 --time=3:00 --mem-per-cpu=12G
srun: Job <number> to start at YYYY-MM-DDTHH:MM:SS using 1 processors on compute_node srun: Job <number> to start at YYYY-MM-DDTHH:MM:SS using 1 processors on compute_node
{{< /highlight >}} ```
{{% /panel %}}
+++ ---
title = "Submitting Jobs" title: Submitting Jobs
description = "How to submit jobs to HCC resources" summary: "How to submit jobs to HCC resources"
weight = "50" weight: 5
+++ ---
Crane and Rhino are managed by Swan is managed by
the [SLURM](https://slurm.schedmd.com) resource manager. the [SLURM](https://slurm.schedmd.com) resource manager.
In order to run processing on Crane, you In order to run processing on Swan, you
must create a SLURM script that will run your processing. After must create a SLURM script that will run your processing. After
submitting the job, SLURM will schedule your processing on an available submitting the job, SLURM will schedule your processing on an available
worker node. worker node.
Before writing a submit file, you may need to Before writing a submit file, you may need to
[compile your application]({{< relref "/applications/user_software" >}}). [compile your application](/applications/user_software/).
- [Ensure proper working directory for job output](#ensure-proper-working-directory-for-job-output) - [Ensure proper working directory for job output](#ensure-proper-working-directory-for-job-output)
- [Creating a SLURM Submit File](#creating-a-slurm-submit-file) - [Creating a SLURM Submit File](#creating-a-slurm-submit-file)
- [Submitting the job](#submitting-the-job) - [Submitting the job](#submitting-the-job)
- [Checking Job Status](#checking-job-status) - [Checking Job Status](#checking-job-status)
- [Checking Job Start](#checking-job-start) - [Checking Job Start](#checking-job-start)
- [Removing the Job](#removing-the-job)
- [Next Steps](#next-steps) - [Next Steps](#next-steps)
### Ensure proper working directory for job output ### Ensure proper working directory for job output
{{% notice info %}}
Because the /home directories are not writable from the worker nodes, all SLURM job output should be directed to your /work path.
{{% /notice %}}
{{% panel theme="info" header="Manual specification of /work path" %}} !!! note "Manual specification of /work path"
{{< highlight bash >}} ```bash
$ cd /work/[groupname]/[username] $ cd /work/[groupname]/[username]
{{< /highlight >}} ```
{{% /panel %}}
The environment variable `$WORK` can also be used. The environment variable `$WORK` can also be used.
{{% panel theme="info" header="Using environment variable for /work path" %}} !!! note "Using environment variable for /work path"
{{< highlight bash >}} ```bash
$ cd $WORK $ cd $WORK
$ pwd $ pwd
/work/[groupname]/[username] /work/[groupname]/[username]
{{< /highlight >}} ```
{{% /panel %}}
Review how /work differs from /home [here.]({{< relref "/guides/handling_data/_index.md" >}}) Review how /work differs from /home [here.](/handling_data)
### Creating a SLURM Submit File ### Creating a SLURM Submit File
{{% notice info %}} !!! note
The below example is for a serial job. For submitting MPI jobs, please The below example is for a serial job. For submitting MPI jobs, please look at the [MPI Submission Guide.](submitting_an_mpi_job/)
look at the [MPI Submission Guide.]({{< relref "submitting_an_mpi_job" >}})
{{% /notice %}}
A SLURM submit file is broken into 2 sections, the job description and A SLURM submit file is broken into 2 sections, the job description and
the processing. SLURM job description are prepended with `#SBATCH` in the processing. SLURM job description are prepended with `#SBATCH` in
...@@ -58,7 +54,7 @@ the submit file. ...@@ -58,7 +54,7 @@ the submit file.
**SLURM Submit File** **SLURM Submit File**
{{< highlight batch >}} ```bat
#!/bin/bash #!/bin/bash
#SBATCH --time=03:15:00 # Run time in hh:mm:ss #SBATCH --time=03:15:00 # Run time in hh:mm:ss
#SBATCH --mem-per-cpu=1024 # Maximum memory required per CPU (in megabytes) #SBATCH --mem-per-cpu=1024 # Maximum memory required per CPU (in megabytes)
...@@ -70,7 +66,7 @@ module load example/test ...@@ -70,7 +66,7 @@ module load example/test
hostname hostname
sleep 60 sleep 60
{{< /highlight >}} ```
- **time** - **time**
Maximum walltime the job can run. After this time has expired, the Maximum walltime the job can run. After this time has expired, the
...@@ -81,14 +77,14 @@ sleep 60 ...@@ -81,14 +77,14 @@ sleep 60
- **mem** - **mem**
Specify the real memory required per node in MegaBytes. If you Specify the real memory required per node in MegaBytes. If you
exceed this limit, your job will be stopped. Note that for you exceed this limit, your job will be stopped. Note that for you
should ask for less memory than each node actually has. For Crane, the should ask for less memory than each node actually has. For Swan, the
max is 500GB. max is 2000GB.
- **job-name** - **job-name**
The name of the job. Will be reported in the job listing. The name of the job. Will be reported in the job listing.
- **partition** - **partition**
The partition the job should run in. Partitions determine the job's The partition the job should run in. Partitions determine the job's
priority and on what nodes the partition can run on. See the priority and on what nodes the partition can run on. See the
[Partitions]({{< relref "/guides/submitting_jobs/partitions/_index.md" >}}) page for a list of possible partitions. [Partitions](/submitting_jobs/partitions) page for a list of possible partitions.
- **error** - **error**
Location of the stderr will be written for the job. `[groupname]` Location of the stderr will be written for the job. `[groupname]`
and `[username]` should be replaced your group name and username. and `[username]` should be replaced your group name and username.
...@@ -98,7 +94,7 @@ sleep 60 ...@@ -98,7 +94,7 @@ sleep 60
Location of the stdout will be written for the job. Location of the stdout will be written for the job.
More advanced submit commands can be found on the [SLURM Docs](https://slurm.schedmd.com/sbatch.html). More advanced submit commands can be found on the [SLURM Docs](https://slurm.schedmd.com/sbatch.html).
You can also find an example of a MPI submission on [Submitting an MPI Job]({{< relref "submitting_an_mpi_job" >}}). You can also find an example of a MPI submission on [Submitting an MPI Job](submitting_an_mpi_job).
### Submitting the job ### Submitting the job
...@@ -108,12 +104,12 @@ the submit file. ...@@ -108,12 +104,12 @@ the submit file.
Submitting the job described above is: Submitting the job described above is:
{{% panel theme="info" header="SLURM Submission" %}} !!! note "SLURM Submission"
{{< highlight batch >}} ```bash
$ sbatch example.slurm $ sbatch example.slurm
Submitted batch job 24603 Submitted batch job 24603
{{< /highlight >}} ```
{{% /panel %}}
The job was successfully submitted. The job was successfully submitted.
...@@ -135,24 +131,24 @@ information such as: ...@@ -135,24 +131,24 @@ information such as:
Checking the status of the job is easiest by filtering by your username, Checking the status of the job is easiest by filtering by your username,
using the `-u` option to squeue. using the `-u` option to squeue.
{{< highlight batch >}} ```bat
$ squeue -u <username> $ squeue -u <username>
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
24605 batch hello-wo <username> R 0:56 1 b01 24605 batch hello-wo <username> R 0:56 1 b01
{{< /highlight >}} ```
Additionally, if you want to see the status of a specific partition, for Additionally, if you want to see the status of a specific partition, for
example if you are part of a [partition]({{< relref "/guides/submitting_jobs/partitions/_index.md" >}}), example if you are part of a [partition](/submitting_jobs/partitions),
you can use the `-p` option to `squeue`: you can use the `-p` option to `squeue`:
{{< highlight batch >}} ```bat
$ squeue -p esquared $ squeue -p guest
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
73435 esquared MyRandom tingting R 10:35:20 1 ri19n10 73435 guest MyRandom demo01 R 10:35:20 1 ri19n10
73436 esquared MyRandom tingting R 10:35:20 1 ri19n12 73436 guest MyRandom demo01 R 10:35:20 1 ri19n12
73735 esquared SW2_driv hroehr R 10:14:11 1 ri20n07 73735 guest SW2_driv demo02 R 10:14:11 1 ri20n07
73736 esquared SW2_driv hroehr R 10:14:11 1 ri20n07 73736 guest SW2_driv demo02 R 10:14:11 1 ri20n07
{{< /highlight >}} ```
#### Checking Job Start #### Checking Job Start
...@@ -160,21 +156,21 @@ You may view the start time of your job with the ...@@ -160,21 +156,21 @@ You may view the start time of your job with the
command `squeue --start`. The output of the command will show the command `squeue --start`. The output of the command will show the
expected start time of the jobs. expected start time of the jobs.
{{< highlight batch >}} ```bat
$ squeue --start --user lypeng $ squeue --start --user demo03
JOBID PARTITION NAME USER ST START_TIME NODES NODELIST(REASON) JOBID PARTITION NAME USER ST START_TIME NODES NODELIST(REASON)
5822 batch Starace lypeng PD 2013-06-08T00:05:09 3 (Priority) 5822 batch python demo03 PD 2013-06-08T00:05:09 3 (Priority)
5823 batch Starace lypeng PD 2013-06-08T00:07:39 3 (Priority) 5823 batch python demo03 PD 2013-06-08T00:07:39 3 (Priority)
5824 batch Starace lypeng PD 2013-06-08T00:09:09 3 (Priority) 5824 batch python demo03 PD 2013-06-08T00:09:09 3 (Priority)
5825 batch Starace lypeng PD 2013-06-08T00:12:09 3 (Priority) 5825 batch python demo03 PD 2013-06-08T00:12:09 3 (Priority)
5826 batch Starace lypeng PD 2013-06-08T00:12:39 3 (Priority) 5826 batch python demo03 PD 2013-06-08T00:12:39 3 (Priority)
5827 batch Starace lypeng PD 2013-06-08T00:12:39 3 (Priority) 5827 batch python demo03 PD 2013-06-08T00:12:39 3 (Priority)
5828 batch Starace lypeng PD 2013-06-08T00:12:39 3 (Priority) 5828 batch python demo03 PD 2013-06-08T00:12:39 3 (Priority)
5829 batch Starace lypeng PD 2013-06-08T00:13:09 3 (Priority) 5829 batch python demo03 PD 2013-06-08T00:13:09 3 (Priority)
5830 batch Starace lypeng PD 2013-06-08T00:13:09 3 (Priority) 5830 batch python demo03 PD 2013-06-08T00:13:09 3 (Priority)
5831 batch Starace lypeng PD 2013-06-08T00:14:09 3 (Priority) 5831 batch python demo03 PD 2013-06-08T00:14:09 3 (Priority)
5832 batch Starace lypeng PD N/A 3 (Priority) 5832 batch python demo03 PD N/A 3 (Priority)
{{< /highlight >}} ```
The output shows the expected start time of the jobs, as well as the The output shows the expected start time of the jobs, as well as the
reason that the jobs are currently idle (in this case, low priority of reason that the jobs are currently idle (in this case, low priority of
...@@ -186,10 +182,29 @@ Removing the job is done with the `scancel` command. The only argument ...@@ -186,10 +182,29 @@ Removing the job is done with the `scancel` command. The only argument
to the `scancel` command is the job id. For the job above, the command to the `scancel` command is the job id. For the job above, the command
is: is:
{{< highlight batch >}} ```bat
$ scancel 24605 $ scancel 24605
{{< /highlight >}} ```
### Next Steps ### Next Steps
!!! tip "Looking to reduce your wait time on Swan?"
HCC wants to hear more about your research! If you acknowledge HCC in your publications, posters, or journal articles, you can receive a boost in priority on Swan!
Details on the process and requirements are available in the [HCC Acknowledgement Credit](./hcc_acknowledgment_credit.md) documentation page.
- [Application Specific Guides](./app_specific)
- [Monitoring Jobs](./monitoring_jobs.md)
- [Creating an Interactive Job](./creating_an_interactive_job.md)
- [Submitting a GPU Job](./submitting_gpu_jobs.md)
- [GPU Job Monitoring and Optimization](./monitoring_GPU_usage.md)
- [Submitting an MPI Job](./submitting_an_mpi_job.md)
- [Submitting an OpenMP Job](./submitting_an_openmp_job.md)
- [Submitting a Job Array](./submitting_a_job_array.md)
- [Setting up Dependant Jobs](./job_dependencies.md)
- [Available Partitions on Swan](./partitions/swan_available_partitions.md)
{{% children %}}
+++ ---
title = "Job Dependencies" title: Job Dependencies
description = "How to use job dependencies with the SLURM scheduler." summary: "How to use job dependencies with the SLURM scheduler."
weight=55 weight: 55
+++ ---
The job dependency feature of SLURM is useful when you need to run The job dependency feature of SLURM is useful when you need to run
multiple jobs in a particular order. A standard example of this is a multiple jobs in a particular order. A standard example of this is a
...@@ -28,13 +28,12 @@ This example is usually referred to as a "diamond" workflow.  There are ...@@ -28,13 +28,12 @@ This example is usually referred to as a "diamond" workflow.  There are
B and C both depend on Job A completing before they can run. Job D then B and C both depend on Job A completing before they can run. Job D then
depends on Jobs B and C completing. depends on Jobs B and C completing.
{{< figure src="/images/4980738.png" width="400" >}} <img src="/images/4980738.png" width="400">
The SLURM submit files for each step are below. The SLURM submit files for each step are below.
!!! note "JobA.submit"
{{%expand "JobA.submit" %}} ```bat
{{< highlight batch >}} #!/bin/bash
#!/bin/sh
#SBATCH --job-name=JobA #SBATCH --job-name=JobA
#SBATCH --time=00:05:00 #SBATCH --time=00:05:00
#SBATCH --ntasks=1 #SBATCH --ntasks=1
...@@ -43,13 +42,13 @@ The SLURM submit files for each step are below. ...@@ -43,13 +42,13 @@ The SLURM submit files for each step are below.
echo "I'm job A" echo "I'm job A"
echo "Sample job A output" > jobA.out echo "Sample job A output" > jobA.out
sleep 120 sleep 120
{{< /highlight >}} ```
{{% /expand %}}
{{%expand "JobB.submit" %}} !!! note "JobB.submit"
{{< highlight batch >}} ```bat
#!/bin/sh #!/bin/bash
#SBATCH --job-name=JobB #SBATCH --job-name=JobB
#SBATCH --time=00:05:00 #SBATCH --time=00:05:00
#SBATCH --ntasks=1 #SBATCH --ntasks=1
...@@ -61,12 +60,12 @@ cat jobA.out >> jobB.out ...@@ -61,12 +60,12 @@ cat jobA.out >> jobB.out
echo "" >> jobB.out echo "" >> jobB.out
echo "Sample job B output" >> jobB.out echo "Sample job B output" >> jobB.out
sleep 120 sleep 120
{{< /highlight >}} ```
{{% /expand %}}
{{%expand "JobC.submit" %}} !!! note "JobC.submit"
{{< highlight batch >}} ```bat
#!/bin/sh #!/bin/bash
#SBATCH --job-name=JobC #SBATCH --job-name=JobC
#SBATCH --time=00:05:00 #SBATCH --time=00:05:00
#SBATCH --ntasks=1 #SBATCH --ntasks=1
...@@ -78,12 +77,12 @@ cat jobA.out >> jobC.out ...@@ -78,12 +77,12 @@ cat jobA.out >> jobC.out
echo "" >> jobC.out echo "" >> jobC.out
echo "Sample job C output" >> jobC.out echo "Sample job C output" >> jobC.out
sleep 120 sleep 120
{{< /highlight >}} ```
{{% /expand %}}
{{%expand "JobC.submit" %}}
{{< highlight batch >}} !!! note "JobD.submit"
#!/bin/sh ```bat
#!/bin/bash
#SBATCH --job-name=JobD #SBATCH --job-name=JobD
#SBATCH --time=00:05:00 #SBATCH --time=00:05:00
#SBATCH --ntasks=1 #SBATCH --ntasks=1
...@@ -97,53 +96,53 @@ cat jobC.out >> jobD.out ...@@ -97,53 +96,53 @@ cat jobC.out >> jobD.out
echo "" >> jobD.out echo "" >> jobD.out
echo "Sample job D output" >> jobD.out echo "Sample job D output" >> jobD.out
sleep 120 sleep 120
{{< /highlight >}} ```
{{% /expand %}}
To start the workflow, submit Job A first: To start the workflow, submit Job A first:
{{% panel theme="info" header="Submit Job A" %}} !!! note "Submit Job A"
{{< highlight batch >}} ```bat
[demo01@login.crane demo01]$ sbatch JobA.submit [demo01@login.swan demo01]$ sbatch JobA.submit
Submitted batch job 666898 Submitted batch job 666898
{{< /highlight >}} ```
{{% /panel %}}
Now submit jobs B and C, using the job id from Job A to indicate the Now submit jobs B and C, using the job id from Job A to indicate the
dependency: dependency:
{{% panel theme="info" header="Submit Jobs B and C" %}} !!! note "Submit Jobs B and C"
{{< highlight batch >}} ```bat
[demo01@login.crane demo01]$ sbatch -d afterok:666898 JobB.submit [demo01@login.swan demo01]$ sbatch -d afterok:666898 JobB.submit
Submitted batch job 666899 Submitted batch job 666899
[demo01@login.crane demo01]$ sbatch -d afterok:666898 JobC.submit [demo01@login.swan demo01]$ sbatch -d afterok:666898 JobC.submit
Submitted batch job 666900 Submitted batch job 666900
{{< /highlight >}} ```
{{% /panel %}}
Finally, submit Job D as depending on both jobs B and C: Finally, submit Job D as depending on both jobs B and C:
{{% panel theme="info" header="Submit Job D" %}} !!! note "Submit Job D"
{{< highlight batch >}} ```bat
[demo01@login.crane demo01]$ sbatch -d afterok:666899:666900 JobD.submit [demo01@login.swan demo01]$ sbatch -d afterok:666899:666900 JobD.submit
Submitted batch job 666901 Submitted batch job 666901
{{< /highlight >}} ```
{{% /panel %}}
Running `squeue` will now show all four jobs. The output from `squeue` Running `squeue` will now show all four jobs. The output from `squeue`
will also indicate that Jobs B, C, and D are in a pending state because will also indicate that Jobs B, C, and D are in a pending state because
of the dependency. of the dependency.
{{% panel theme="info" header="Squeue Output" %}} !!! note "Squeue Output"
{{< highlight batch >}} ```bat
[demo01@login.crane demo01]$ squeue -u demo01 [demo01@login.swan demo01]$ squeue -u demo01
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
666899 batch JobB demo01 PD 0:00 1 (Dependency) 666899 batch JobB demo01 PD 0:00 1 (Dependency)
666900 batch JobC demo01 PD 0:00 1 (Dependency) 666900 batch JobC demo01 PD 0:00 1 (Dependency)
666901 batch JobD demo01 PD 0:00 1 (Dependency) 666901 batch JobD demo01 PD 0:00 1 (Dependency)
666898 batch JobA demo01 R 0:52 1 c2409 666898 batch JobA demo01 R 0:52 1 c2409
{{< /highlight >}} ```
{{% /panel %}}
As the each job completes successfully, SLURM will run the job(s) in the As the each job completes successfully, SLURM will run the job(s) in the
workflow as resources become available. workflow as resources become available.
---
title: GPU Monitoring and Optimizing
summary: "How to monitor GPU usage in real time and optimize GPU performance."
weight: 60
---
This document provides a comprehensive guide to monitoring GPU usage and optimizing GPU performance on the HCC. Its goal is to help you identify GPU bottlenecks in your jobs and offer instructions for optimizing GPU resource utilization.
### Table of Contents
- [Measuring GPU Utilization in Real Time](#measuring-gpu-utilization-in-real-time)
- [Logging and Reporting GPU Utilization](#logging-and-reporting-gpu-utilization)
- [nvidia-smi](#nvidia-smi)
- [TensorBoard](#tensorboard)
- [How to Improve Your GPU Utilization](#how-to-improve-your-gpu-utilization)
- [Maximize Parallelism](#maximize-parallelism)
- [Memory Management and Optimization](#memory-management-and-optimization)
- [Use Shared Memory Effectively](#use-shared-memory-effectively)
- [Avoid Memory Divergence](#avoid-memory-divergence)
- [Reduce Memory Footprint](#reduce-memory-footprint)
- [Minimize CPU-GPU Memory Transferring Overhead](#minimize-cpu-gpu-memory-transferring-overhead)
- [How to Improve Your GPU Utilization for Deep Learning Jobs](#how-to-improve-your-gpu-utilization-for-deep-learning-jobs)
- [Maximize Batch Size](#maximize-batch-size)
- [Optimize Data Loading and Preprocessing](#optimize-data-loading-and-preprocessing)
- [Optimize Model Architecture](#optimize-model-architecture)
- [Common Oversights](#common-oversights)
- [Overlooking GPU-CPU Memory Transfer Costs](#overlooking-gpu-cpu-memory-transfer-costs)
- [Not Leveraging GPU Libraries](#not-leveraging-gpu-libraries)
- [Not Handling GPU-Specific Errors](#not-handling-gpu-specific-errors)
- [Neglecting Multi-GPU Scalability](#neglecting-multi-gpu-scalability)
### Measuring GPU Utilization in Real Time
You can use the `nvidia-smi` command to monitor GPU usage in real time. This tool provides details on GPU memory usage and utilization. To monitor a job, you need access to the same node where the job is running.
!!! warning
If the job to be monitored is using all available resources for a node, the user will not be able to obtain a simultaneous interactive job.
Once the job has been submitted and is running, you can request an interactive session on the same node using the following srun command:
```bash
srun --jobid=<JOB_ID> --pty bash
```
where `<JOB_ID>` is replaced by the job ID for the monitored job as assigned by SLURM.
After getting access to the node, use the following command to monitor GPU performance in real time:
```bash
watch -n 1 nvidia-smi
```
<img src="/images/nvidia-smi_example.png" width="700">
Note that `nvidia-smi` only shows the process ID (`PID`) of the running GPU jobs. If multiple jobs are running on the same node, you'll need to match the `PID` to your job using the top command. Start the top command as follows:
```bash
top
```
In top, the `PID` appears in the first column, and your login ID is shown in the `USER` column. Use this to identify the process corresponding to your job.
<img src="/images/srun_top.png" width="700">
### Logging and Reporting GPU Utilization
#### nvidia-smi
You can use `nvidia-smi` to periodically log GPU usage in CSV files for later analysis. This is convenient to be added in the SLURM submit script instead of running it interactively as shown above. To do this, wrap your job command with the following in your SLURM submission script. This will generate three files in your `$WORK` directory:
1. **`gpu_usage_log.csv`**: contains overall GPU performance data, including GPU utilization, memory utilization, and total GPU memory.
2. **`pid_gpu_usage_log.csv`**: logs GPU usage for each process, including the process ID (PID) and GPU memory used by each process. Note that, to match a specific PID with overall GPU performance in the generated file, use the GPU bus ID.
3. **`pid_lookup.txt`**: provides the process ID to help identify which one corresponds to your job in the GPU records.
Note that the job ID will be appended to the file names to help match the logs with your specific job.
```bash
curpath=`pwd`
cd $WORK
nohup nvidia-smi --query-gpu=timestamp,index,gpu_bus_id,utilization.gpu,utilization.memory,memory.used,memory.total --format=csv -f gpu_usage_log.csv-$SLURM_JOB_ID -l 1 > /dev/null 2>&1 &
gpumonpid=$!
nohup nvidia-smi --query-compute-apps=timestamp,gpu_bus_id,pid,used_memory --format=csv -f pid_gpu_usage_log-$SLURM_JOB_ID.csv -l 1 > /dev/null 2>&1 &
gpumonprocpid=$!
nohup top -u <LOGIN-ID> -d 10 -c -b -n 2 > pid_lookup-$SLURM_JOB_ID.txt 2>&1 &
cd $curpath
<YOUR_JOB_COMMAND>
kill $gpumonpid
kill $gpumonprocpid
```
where `<LOGIN-ID>` is replaced by your HCC login ID and `<YOUR_JOB_COMMAND>` is replaced by your job command. A complete example SLURM submit script that utilizes this approach can be found [here](https://github.com/unlhcc/job-examples/tree/master/tensorflow_gpu_tracking).
#### TensorBoard
If your deep learning job utilizes libraries such as `TensorFlow` or `PyTorch`, you can use TensorBoard to monitor and visualize GPU usage metrics, including GPU utilization, memory consumption, and model performance. TensorBoard provides real-time insights into how your job interacts with the GPU, helping you optimize performance and identify bottlenecks.
To monitor GPU usage with `TensorBoard`, refer to the specific instructions of `TensorFlow` or `PyTorch` to enable logging with `TensorBoard` in your job code:
1. **`TensorFlow`** - [TensorFlow Profiler Guide](https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras)
2. **`PyTorch`** - [PyTorch Profiler with TensorBoard](https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html)
On Swan, TensorBoard is available as [Open OnDemand App](https://hcc.unl.edu/docs/open_ondemand/virtual_desktop_and_interactive_apps/).
### How to Improve Your GPU Utilization
Improving GPU utilization means maximizing both the computational and memory usage of the GPU to ensure your program fully utilizes GPU's processing power. Low utilization can result from various bottlenecks, including improper parallelism, insufficient memory management, or CPU-GPU communication overhead.
#### Maximize Parallelism
The GPU is powerful because its parallel processing capabilities. Your job should leverage parallelism effectively:
1. **Optimize grid and block dimensions**: configure your thread and block settings to match your job's data size to fully utilize GPU cores.
2. **Occupancy**: use tools like CUDA’s occupancy calculator to determine the best number of threads per block that maximizes utilization.
3. **Streamlining parallel tasks**: CUDA streams can be used to execute multiple operations concurrently. This allows for overlapping computation on the GPU with data transfers, improving efficiency.
#### Memory Management and Optimization
##### Use Shared Memory Effectively
Shared memory is a small, high-speed memory located on the GPU. It can be used to reduce global memory access latency by storing frequently used data. Use shared memory to cache data that is repeatedly accessed by multiple threads.
##### Avoid Memory Divergence
Memory divergence occurs when threads in a warp access non-contiguous memory locations, resulting in multiple memory transactions. To minimize divergence:
- **Align memory access**: ensure that threads in a warp access contiguous memory addresses.
- **Use memory coalescing**: organize memory access patterns to allow for coalesced memory transactions, reducing the number of memory accesses required.
##### Reduce Memory Footprint
Excessive memory use can lead to spills into slower global memory. Minimize your program’s memory footprint by:
- **Freeing unused memory**: always release memory that is no longer needed.
- **Optimizing data structures**: use more compact data structures and reduce precision when possible (e.g., using floats instead of doubles).
#### Minimize CPU-GPU Memory Transferring Overhead
Data transfer between the CPU and GPU is often a bottleneck in scientific programs. It is essential to minimize these transfers to improve overall GPU performance. Here are some tips:
1. **Batch data transfers**: transfer large chunks of data at once rather than sending small bits frequently.
2. **Asynchronous memory transfers**: use non-blocking memory transfer operations (e.g., cudaMemcpyAsync for CUDA) to allow computation and data transfer to overlap.
3. **Pin memory**: use pinned (page-locked) memory on the CPU for faster transfer of data to and from the GPU.
### How to Improve Your GPU Utilization for Deep Learning Jobs
In deep learning, GPUs are a key component for accelerating model training and inference due to their ability to handle large-scale matrix operations and parallelism. Below are tips to maximize GPU utilization in deep learning jobs.
#### Maximize Batch Size
Batch size refers to the number of training samples processed simultaneously. Larger batch sizes improve GPU utilization by increasing the workload per step. The batch size should fit within the GPU’s memory constraints:
- **Larger batch sizes**: result in better utilization but require more memory.
- **Gradient accumulation**: if GPU memory limits are reached, you can accumulate gradients over several smaller batches before performing a parameter update, effectively simulating larger batch sizes.
#### Optimize Data Loading and Preprocessing
Data loading can become a bottleneck, causing the GPU to idle while waiting for data.
- **Parallel data loading**: load data in parallel to speed up (e.g., using libraries like PyTorch’s `DataLoader` or TensorFlow’s `tf.data` pipeline).
- **Prefetch data**: use techniques (e.g., double-buffering) to overlap data preprocessing and augmentation with model computation, enabling data to be fetched in advance. This helps reduce the GPU idle time.
#### Optimize Model Architecture
Model architecture impacts the GPU utilization. Here are some optimization tips:
- **Reduce memory bottlenecks**: avoiding excessive use of operations that cause memory overhead (e.g., deep recursive layers).
- **Imporve parallelism**: using layers that can exploit parallelism (e.g., convolutions, matrix multiplications).
- **Prune unnecessary layers**: prune your model by removing layers or neurons that don’t contribute significantly to the output, reducing computation time and improving efficiency.
### Common Oversights
#### Overlooking GPU-CPU Memory Transfer Costs
Memory transfers between CPU and GPU can be expensive, and excessive data movement can reverse the performance gains offered by parallelism on GPU.
#### Not Leveraging GPU Libraries
There are highly optimized libraries available for GPU-accelerated algorithms, such as linear algebra and FFTs. Always check for these libraries before implementing your own solution, as they are often more efficient and reliable.
#### Not Handling GPU-Specific Errors
GPU computation errors can lead to silent failures, making debugging extremely difficult. For example, insufficient memory on the GPU or illegal memory access can go undetected without proper error handling.
#### Neglecting Multi-GPU Scalability
Many programs are initially designed for single-GPU execution and lack support for multiple GPUs. Make sure your program is optimized for multi-GPU execution before scaling up to request multiple GPU resources.
+++ ---
title = "Monitoring Jobs" title: Monitoring Jobs
description = "How to find out information about running and completed jobs." summary: "How to find out information about running and completed jobs."
weight=55 weight: 55
+++ ---
Careful examination of running times, memory usage and output files will Careful examination of running times, memory usage and output files will
allow you to ensure the job completed correctly and give you a good idea allow you to ensure the job completed correctly and give you a good idea
...@@ -10,17 +10,35 @@ of what memory and time limits to request in the future. ...@@ -10,17 +10,35 @@ of what memory and time limits to request in the future.
### Monitoring Completed Jobs: ### Monitoring Completed Jobs:
#### seff
The `seff` command provides a quick summary of a single job's resource utilization and efficiency after it has been completed, including status, wall usage, runtime, and memory usage of a job:
```bash
seff <JOB_ID>
```
<img src="/images/slurm_seff_1.png" height="250">
!!! note
1. `seff` gathers resource utilization every 30 seconds, so it is possible for some peak utilization to be missed in the report.
2. For multi node jobs, the `Memory Utilized` reported by `seff` is for **one node only**.
For more accurate report, please use `sacct` instead.
#### sacct
To see the runtime and memory usage of a job that has completed, use the To see the runtime and memory usage of a job that has completed, use the
sacct command: sacct command:
{{< highlight bash >}} ```bash
sacct sacct
{{< /highlight >}} ```
Lists all jobs by the current user and displays information such as Lists all jobs by the current user and displays information such as
JobID, JobName, State, and ExitCode. JobID, JobName, State, and ExitCode.
{{< figure src="/images/21070053.png" height="150" >}} <img src="/images/sacct_generic.png" height="150">
Coupling this command with the --format flag will allow you to see more Coupling this command with the --format flag will allow you to see more
than the default information about a job. Fields to display should be than the default information about a job. Fields to display should be
...@@ -28,15 +46,17 @@ listed as a comma separated list after the --format flag (without ...@@ -28,15 +46,17 @@ listed as a comma separated list after the --format flag (without
spaces). For example, to see the Elapsed time and Maximum used memory by spaces). For example, to see the Elapsed time and Maximum used memory by
a job, this command can be used: a job, this command can be used:
{{< highlight bash >}} ```bash
sacct --format JobID,JobName,Elapsed,MaxRSS sacct --format JobID,JobName,Elapsed,MaxRSS
{{< /highlight >}} ```
{{< figure src="/images/21070054.png" height="150" >}} <img src="/images/sacct_format.png" height="150">
Additional arguments and format field information can be found in Additional arguments and format field information can be found in
[the SLURM documentation](https://slurm.schedmd.com/sacct.html). [the SLURM documentation](https://slurm.schedmd.com/sacct.html).
### Monitoring Running Jobs: ### Monitoring Running Jobs:
There are two ways to monitor running jobs, the `top` command and There are two ways to monitor running jobs, the `top` command and
monitoring the `cgroup` files using the utility`cgget`. `top` is helpful monitoring the `cgroup` files using the utility`cgget`. `top` is helpful
...@@ -45,42 +65,42 @@ information on memory usage. Both of these tools require the use of an ...@@ -45,42 +65,42 @@ information on memory usage. Both of these tools require the use of an
interactive job on the same node as the job to be monitored while the job interactive job on the same node as the job to be monitored while the job
is running. is running.
{{% notice warning %}} !!! warning
If the job to be monitored is using all available resources for a node, If the job to be monitored is using all available resources for a node,
the user will not be able to obtain a simultaneous interactive job. the user will not be able to obtain a simultaneous interactive job.
{{% /notice %}}
After the job to be monitored is submitted and has begun to run, request After the job to be monitored is submitted and has begun to run, request
an interactive job on the same node using the srun command: an interactive job on the same node using the srun command:
{{< highlight bash >}} ```bash
srun --jobid=<JOB_ID> --pty bash srun --jobid=<JOB_ID> --pty bash
{{< /highlight >}} ```
where `<JOB_ID>` is replaced by the job id for the monitored job as where `<JOB_ID>` is replaced by the job id for the monitored job as
assigned by SLURM. assigned by SLURM.
Alternately, you can request the interactive job by nodename as follows: Alternately, you can request the interactive job by nodename as follows:
{{< highlight bash >}} ```bash
srun --nodelist=<NODE_ID> --pty bash srun --nodelist=<NODE_ID> --pty bash
{{< /highlight >}} ```
where `<NODE_ID>` is replaced by the name of the node where the monitored where `<NODE_ID>` is replaced by the name of the node where the monitored
job is running. This information can be found out by looking at the job is running. This information can be found out by looking at the
squeue output under the `NODELIST` column. squeue output under the `NODELIST` column.
{{< figure src="/images/21070055.png" width="700" >}} <img src="/images/srun_node_id.png" width="700">
### Using `top` to monitor running jobs ### Using `top` to monitor running jobs
Once the interactive job begins, you can run `top` to view the processes Once the interactive job begins, you can run `top` to view the processes
on the node you are on: on the node you are on:
{{< figure src="/images/21070056.png" height="400" >}} <img src="/images/srun_top.png" height="400">
Output for `top` displays each running process on the node. From the above Output for `top` displays each running process on the node. From the above
image, we can see the various MATLAB processes being run by user image, we can see the various MATLAB processes being run by user
cathrine98. To filter the list of processes, you can type `u` followed hccdemo. To filter the list of processes, you can type `u` followed
by the username of the user who owns the processes. To exit this screen, by the username of the user who owns the processes. To exit this screen,
press `q`. press `q`.
...@@ -97,31 +117,30 @@ on the same node as the monitored job. Then, to view specific files and informat ...@@ -97,31 +117,30 @@ on the same node as the monitored job. Then, to view specific files and informat
use one of the following commands: use one of the following commands:
##### To view current memory usage: ##### To view current memory usage:
{{< highlight bash >}} ```bash
cgget -r memory.usage_in_bytes /slurm/uid_<UID>/job_<SLURM_JOBID>/ cgget -r memory.usage_in_bytes /slurm/uid_<UID>/job_<SLURM_JOBID>/
{{< /highlight >}} ```
where `<UID>` is replaced by your UID and `<SLURM_JOBID>` is where `<UID>` is replaced by your UID and `<SLURM_JOBID>` is
replaced by the monitored job's Job ID as assigned by SLURM. replaced by the monitored job's Job ID as assigned by SLURM.
{{% notice note %}} !!! note
To find your `uid`, use the command `id -u`. Your UID never changes and is To find your `uid`, use the command `id -u`. Your UID never changes and is the same on all HCC clusters (*not* on Anvil, however!).
the same on all HCC clusters (*not* on Anvil, however!).
{{% /notice %}}
##### To view the total CPU time, in nanoseconds, consummed by the job: ##### To view the total CPU time, in nanoseconds, consummed by the job:
{{< highlight bash >}} ```bash
cgget -r cpuacct.usage /slurm/uid_<UID>/job_<SLURM_JOBID>/ cgget -r cpuacct.usage /slurm/uid_<UID>/job_<SLURM_JOBID>/
{{< /highlight >}} ```
Since the `cgroup` files are available only during the job is running, another Since the `cgroup` files are available only during the job is running, another
way of accessing the information from these files is through the submit job. way of accessing the information from these files is through the submit job.
To track for example, the maximum memory usage of a job, you can add To track for example, the maximum memory usage of a job, you can add
{{< highlight bash >}} ```bash
cgget -r memory.max_usage_in_bytes /slurm/uid_${UID}/job_${SLURM_JOBID}/ cgget -r memory.max_usage_in_bytes /slurm/uid_${UID}/job_${SLURM_JOBID}/
{{< /highlight >}} ```
at the end of your submit file. Unlike the previous examples, you do not need to at the end of your submit file. Unlike the previous examples, you do not need to
modify this command - here `UID` and `SLURM_JOBID` are variables that will be set modify this command - here `UID` and `SLURM_JOBID` are variables that will be set
when the job is submitted. when the job is submitted.
...@@ -138,11 +157,43 @@ at the end of your submit script. ...@@ -138,11 +157,43 @@ at the end of your submit script.
`mem_report` can also be run as part of an interactive job: `mem_report` can also be run as part of an interactive job:
{{< highlight bash >}} ```bash
[demo13@c0218.crane ~]$ mem_report [demo13@c0218.swan ~]$ mem_report
Current memory usage for job 25745709 is: 2.57 MBs Current memory usage for job 25745709 is: 2.57 MBs
Maximum memory usage for job 25745709 is: 3.27 MBs Maximum memory usage for job 25745709 is: 3.27 MBs
{{< /highlight >}} ```
When `cgget` and `mem_report` are used as part of the submit script, the respective output When `cgget` and `mem_report` are used as part of the submit script, the respective output
is printed in the generated SLURM log files, unless otherwise specified. is printed in the generated SLURM log files, unless otherwise specified.
### Monitoring queued Jobs:
The queue on Swan is a fair-share, which means your jobs priority depends on how long the job has been waiting in the queue, past usage of the cluster, your job size, memory and time requested, etc. Also this will be affected by the amount of jobs waiting on the queue and how much resources are available on the cluster. The more you submitted jobs on the queue the lower priority to run your jobs on the cluster will increase.
You can check when your jobs will be running on the cluster using the command:
```bash
sacct -u <user_id> --format=start
```
To check the start running time for a specific job then you can use the following command:
```bash
sacct -u <user_id> --job=<job_id> --format=start
```
Finally, To check your fairsahre score by running the following command:
```bash
sshare --account=<group_name> -a
```
After you run the above command you will be able to see your fair-share score.
- If your fairshare score is 1.0, then it is indicate that your account has not run any jobs recently (unused).
- If your faireshare score is 0.5, then that means (Average utilization). The Account on average is using exactly as much as their granted Share.
- If your fairshae score is between 0.5 > fairshare > 0, that means (Higher than average utilization). The Account has overused their granted Share.
- Finally, if your fairshare score is 0. That means (No share left). The Account has vastly overused their granted Share. If there is no contention for resources, the jobs will still start.
!!! note "Job Wait Time"
Fairshare priority is not the only factor in how long a job takes to start. The SLURM scheduler needs to find a time where resources are available.
Larger jobs or jobs requiring GPUs may take longer to start in queue while SLURM waits for resources to be available.
There is another way to run your job faster which is by having [Priority Access](https://hcc.unl.edu/priority-access-pricing).
---
title: Available Partitions
summary: "Listing of partitions on Swan."
weight: 70
---
Partitions are used on Swan to distinguish different
resources. You can view the partitions with the command `sinfo`.
### Swan:
Swan has a two shared public partitions available for use. The default partition `{{ hcc.swan.partition.default }}` and the GPU enabled partition, `{{ hcc.swan.partition.gpu }}`.
When you submit a job on Swan without specifying a partition, it will automatically use the `{{ hcc.swan.partition.default }}`
| Partition Name | Notes |
|----------------------------------|--------------------------------------------------|
| {{ hcc.swan.partition.default }} | Default Paritition </br></br> Does not have GPUs |
| {{ hcc.swan.partition.gpu }} | Shared partition with GPUs |
On Swan jobs have a maximum runtime of 7 days, can request up to 2000 cores per user, and run up to 1000 jobs.
#### Worker Node Configuration
The standard configuration of a Swan worker node is:
| Configuration | Value |
|-----------------|--------|
| Cores | 56 |
| Memory | 250 GB |
| Scratch Storage | 3.5 TB |
Some Swan worker nodes are equipped with additional memory, with up to 2TB of memory available in some nodes.
##### GPU Enabled Worker Nodes
For GPU enabled worker nodes in the {{ hcc.swan.partition.gpu }} partition, the following GPUs are available:
{%
include-markdown "../submitting_gpu_jobs.md"
start="requirements if necessary."
end="### Specifying GPU memory (optional)"
%}
Additional GPUs are available in the `guest_gpu` partition, but jobs running on this partition will be preemptable. Details on how the partition operates is available below in [Guest Partition(s)](#guest-partitions). The GPUs in this partition are listed in the [partition list](swan_available_partitions/) for Swan for the [priority access partitions](#ownedpriority-access-partitions) .
!!! warning "Resource requests and utilization"
Please make sure your applications and software support the resources you are requesting.
Many applications are only able to use a single worker node and may not scale well with large numbers of cores.
Please review our information on how many resources to request in our [FAQ](/FAQ/#how-many-nodesmemorytime-should-i-request)
For GPU monitoring and resource requests, please review our page on [monitoring and optimizing GPU resources](submitting_jobs/monitoring_GPU_usage/)
[A full list of partitions is available for Swan](swan_available_partitions/)
### SLURM Quality of Service
Swan has two available Quality of Service types available which help manage how the job gets scheduled.
Overall limitations of maximum job wall time. CPUs, etc. are set for
all jobs with the default setting (when thea "–qos=" section is omitted)
and "short" jobs (described as above) on Swan.
The limitations are shown in the following form.
| | SLURM Specification | Max Job Run Time | Max CPUs per User | Max Jobs per User |
| ------- | -------------------- | ---------------- | ----------------- | ----------------- |
| Default | Leave blank | 7 days | 2000 | 1000 |
| Short | #SBATCH --qos=short | 6 hours | 16 | 2 |
Please also note that the memory and
local hard drive limits are subject to the physical limitations of the
nodes, described in the resources capabilities section of the
[HCC Documentation](/#resource-capabilities)
and the partition sections above.
#### Priority for short jobs
To run short jobs for testing and development work, a job can specify a
different quality of service (QoS). The *short* QoS increases a jobs
priority so it will run as soon as possible.
| SLURM Specification |
|----------------------- |
| `#SBATCH --qos=short` |
!!! warning "Limits per user for 'short' QoS"
- 6 hour job run time
- 2 jobs of 16 CPUs or fewer
- No more than 256 CPUs in use for *short* jobs from all users
### Owned/Priority Access Partitions
Partitions marked as owned by a group means only specific groups are
allowed to submit jobs to that partition. Groups are manually added to
the list allowed to submit jobs to the partition. If you are unable to
submit jobs to a partition, and you feel that you should be, please
contact [hcc-support@unl.edu](mailto:hcc-support@unl.edu).
To submit jobs to an owned partition, use the SLURM `--partition` option. Jobs
can either be submitted *only* to an owned partition, or to *both* the owned
partition and the general access queue. For example, assuming a partition
named `mypartition`:
!!! note "Submit only to an owned partition"
```bash
#SBATCH --partition=mypartition
```
Submitting solely to an owned partition means jobs will start immediately until
the resources on the partition are full, then queue until prior jobs finish and
resources become available.
!!! note "Submit to both an owned partition and general queue"
```bash
#SBATCH --partition=mypartition,batch
```
Submitting to both an owned partition and `batch` means jobs will run on both the owned
partition and the general batch queue. Jobs will start immediately until the resources
on the partition are full, then queue. Pending jobs will then start either on the owned partition
or in the general queue, wherever resources become available first
(taking into account FairShare). Unless there are specific reasons to limit jobs
to owned resources, this method is recommended to maximize job throughput.
[A full list of partitions is available for Swan](swan_available_partitions/)
### Guest Partition(s)
The `guest` partition can be used by users and groups that do not own
dedicated resources on Swan. Jobs running in the `guest` partition
will run on the owned resources with Intel OPA interconnect. The jobs
are preempted when the resources are needed by the resource owners:
guest jobs will be killed and returned to the queue in a pending state
until they can be started on another node.
HCC recommends verifying job behavior will support the restart and
modifying job scripts if necessary.
To submit your job to the guest partition add the line
!!! note "Submit to guest partition"
```bash
#SBATCH --partition=guest
```
to your submit script.
Owned GPU resources may also be accessed in an opportunistic manner by
submitting to the `guest_gpu` partition. Similar to `guest`, jobs are
preempted when the GPU resources are needed by the owners. To submit
your job to the `guest_gpu` partition, add the lines
!!! note "Submit to guest_gpu partition"
```bash
#SBATCH --partition=guest_gpu
#SBATCH --gres=gpu
```
to your SLURM script.
#### Preventing job restart
By default, jobs on the `guest` partition will be restarted elsewhere when they
are preempted. To prevent preempted jobs from being restarted add the line
!!! note "Prevent job restart on guest partition"
```bash
#SBATCH --no-requeue
```
to your SLURM submit file.
---
title: Available Partitions for Swan
summary: "List of available partitions for swan.unl.edu."
---
### Swan:
{{ json_table("docs/static/json/swan_partitions.json") }}
+++ ---
title = "Submitting a Job Array" title: Submitting a Job Array
description = "How to use job arrays with the SLURM scheduler." summary: "How to use job arrays with the SLURM scheduler."
weight=30 weight: 30
+++ ---
A job array is a set of jobs that share the same submit file, but will A job array is a set of jobs that share the same submit file, but will
run multiple copies with a environment variable incremented. These are run multiple copies with a environment variable incremented. These are
...@@ -12,11 +12,11 @@ the same application multiple times. ...@@ -12,11 +12,11 @@ the same application multiple times.
### Creating a Array Submit File ### Creating a Array Submit File
An array submit file is very similar to the example submit files An array submit file is very similar to the example submit files
in [Submitting Jobs]({{< relref "/guides/submitting_jobs/_index.md" >}}). in [Submitting Jobs](/submitting_jobs/).
{{% panel theme="info" header="example.slurm" %}} !!! note "example.slurm"
{{< highlight batch >}} ```bat
#!/bin/sh #!/bin/bash
#SBATCH --array=0-31 #SBATCH --array=0-31
#SBATCH --time=03:15:00 # Run time in hh:mm:ss #SBATCH --time=03:15:00 # Run time in hh:mm:ss
#SBATCH --mem-per-cpu=1024 # Minimum memory required per CPU (in megabytes) #SBATCH --mem-per-cpu=1024 # Minimum memory required per CPU (in megabytes)
...@@ -28,8 +28,8 @@ module load example/test ...@@ -28,8 +28,8 @@ module load example/test
echo "I am task $SLURM_ARRAY_TASK_ID on node `hostname`" echo "I am task $SLURM_ARRAY_TASK_ID on node `hostname`"
sleep 60 sleep 60
{{< /highlight >}} ```
{{% /panel %}}
The submit file above will output the `$SLURM_ARRAY_TASK_ID`, which will The submit file above will output the `$SLURM_ARRAY_TASK_ID`, which will
be different for every one of the 32 (0-31) jobs, to the output files. be different for every one of the 32 (0-31) jobs, to the output files.
......
---
title: Submitting an MPI Job
summary: "How to submit an MPI job on HCC resources."
weight: 40
---
This script requests 16 cores on nodes with InfiniBand:
!!! note "mpi.submit"
```bat
#!/bin/bash
#SBATCH --ntasks=16
#SBATCH --mem-per-cpu=1024
#SBATCH --time=03:15:00
#SBATCH --error=/work/[groupname]/[username]/job.%J.err
#SBATCH --output=/work/[groupname]/[username]/job.%J.out
module load compiler/gcc/8.2 openmpi/2.1
mpirun /home/[groupname]/[username]/mpiprogram
```
The above job will allocate 16 cores on the default partition. The 16
cores could be on any of the nodes in the partition, even split between
multiple nodes.
### Advanced Submission
Some users may prefer to specify more details. This will allocate 32
tasks, 16 on each of two nodes:
!!! note "mpi.submit"
```bat
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --mem-per-cpu=1024
#SBATCH --time=03:15:00
#SBATCH --error=/work/[groupname]/[username]/job.%J.err
#SBATCH --output=/work/[groupname]/[username]/job.%J.out
module load compiler/gcc/8.2 openmpi/2.1
mpirun /home/[groupname]/[username]/mpiprogram
```
+++ ---
title = "Submitting an OpenMP Job" title: Submitting an OpenMP Job
description = "How to submit an OpenMP job on HCC resources." summary: "How to submit an OpenMP job on HCC resources."
+++ weight: 45
---
Submitting an OpenMP job is different from Submitting an OpenMP job is different from
[Submitting an MPI Job]({{< relref "submitting_an_mpi_job" >}}) [Submitting an MPI Job](../submitting_an_mpi_job/)
since you must request multiple cores from a single node. since you must request multiple cores from a single node.
{{% panel theme="info" header="OpenMP example submission" %}} !!! note "OpenMP example submission"
{{< highlight batch >}} ```bat
#!/bin/sh #!/bin/bash
#SBATCH --ntasks-per-node=16 # 16 cores #SBATCH --ntasks-per-node=16 # 16 cores
#SBATCH --nodes=1 # 1 node #SBATCH --nodes=1 # 1 node
#SBATCH --mem-per-cpu=1024 # Minimum memory required per CPU (in megabytes) #SBATCH --mem-per-cpu=1024 # Minimum memory required per CPU (in megabytes)
...@@ -19,8 +20,8 @@ since you must request multiple cores from a single node. ...@@ -19,8 +20,8 @@ since you must request multiple cores from a single node.
export OMP_NUM_THREADS=${SLURM_NTASKS_PER_NODE} export OMP_NUM_THREADS=${SLURM_NTASKS_PER_NODE}
./openmp-app.exe ./openmp-app.exe
{{< /highlight >}} ```
{{% /panel %}}
Notice that we used `ntasks-per-node` to specify the number of cores we Notice that we used `ntasks-per-node` to specify the number of cores we
want on a single node. Additionally, we specify that we only want want on a single node. Additionally, we specify that we only want
...@@ -33,8 +34,7 @@ automatically match the `ntasks-per-node` value (in this example 16). ...@@ -33,8 +34,7 @@ automatically match the `ntasks-per-node` value (in this example 16).
### Compiling ### Compiling
Directions to compile OpenMP can be found on Directions to compile OpenMP can be found on
[Compiling an OpenMP Application] [Compiling an OpenMP Application](/applications/user_software/compiling_an_openmp_application/).
({{< relref "/applications/user_software/compiling_an_openmp_application" >}}).
### Further Documentation ### Further Documentation
......
+++ ---
title = "Submitting GPU Jobs" title: Submitting GPU Jobs
description = "How to submit GPU (CUDA/OpenACC) jobs on HCC resources." summary: "How to submit GPU (CUDA/OpenACC) jobs on HCC resources."
weight=35 weight: 35
+++ ---
### Available GPUs ### Available GPUs
Crane has four types of GPUs available in the `gpu` partition. The Swan has two types of GPUs available in the `gpu` partition. The
type of GPU is configured as a SLURM feature, so you can specify a type type of GPU is configured as a SLURM feature, so you can specify a type
of GPU in your job resource requirements if necessary. of GPU in your job resource requirements if necessary.
| Description | SLURM Feature | Available Hardware | | Description | SLURM Feature | Available Hardware |
| -------------------- | ------------- | ---------------------------- | | -------------------- | ------------- | ---------------------------- |
| Tesla K20, non-IB | gpu_k20 | 3 nodes - 2 GPUs with 4 GB mem per node |
| Teska K20, with IB | gpu_k20 | 3 nodes - 3 GPUs with 4 GB mem per node |
| Tesla K40, with IB | gpu_k40 | 5 nodes - 4 K40M GPUs with 11 GB mem per node<br> 1 node - 2 K40C GPUs |
| Tesla P100, with OPA | gpu_p100 | 2 nodes - 2 GPUs with 12 GB per node |
| Tesla V100, with 10GbE | gpu_v100 | 1 node - 4 GPUs with 16 GB per node | | Tesla V100, with 10GbE | gpu_v100 | 1 node - 4 GPUs with 16 GB per node |
| Tesla V100, with OPA | gpu_v100 | 21 nodes - 2 GPUs with 32GB per node | | Tesla V100, with OPA | gpu_v100 | 21 nodes - 2 GPUs with 32GB per node |
| Tesla V100S | gpu_v100 | 4 nodes - 2 GPUs with 32GB per node |
| Tesla T4 | gpu_t4 | 12 nodes - 2 GPUs with 16GB per node |
| NVIDIA A30 | gpu_a30 | 2 nodes - 4 GPUs with 24GB per node |
### Specifying GPU memory (optional) ### Specifying GPU memory (optional)
...@@ -28,6 +28,7 @@ The available memory specifcations are: ...@@ -28,6 +28,7 @@ The available memory specifcations are:
| -------------- | ------------- | | -------------- | ------------- |
| 12 GB RAM | gpu_12gb | | 12 GB RAM | gpu_12gb |
| 16 GB RAM | gpu_16gb | | 16 GB RAM | gpu_16gb |
| 24 GB RAM | gpu_24gb |
| 32 GB RAM | gpu_32gb | | 32 GB RAM | gpu_32gb |
...@@ -36,62 +37,62 @@ The available memory specifcations are: ...@@ -36,62 +37,62 @@ The available memory specifcations are:
To run your job on the next available GPU regardless of type, add the To run your job on the next available GPU regardless of type, add the
following options to your srun or sbatch command: following options to your srun or sbatch command:
{{< highlight batch >}} ```bat
--partition=gpu --gres=gpu --partition=gpu --gres=gpu
{{< /highlight >}} ```
To run on a specific type of GPU, you can constrain your job to require To run on a specific type of GPU, you can constrain your job to require
a feature. To run on K40 GPUs for example: a feature. To run on P100 GPUs for example:
{{< highlight batch >}} ```bat
--partition=gpu --gres=gpu --constraint=gpu_k40 --partition=gpu --gres=gpu --constraint=gpu_p100
{{< /highlight >}} ```
{{% notice info %}} !!! note
You may request multiple GPUs by changing the` --gres` value to You may request multiple GPUs by changing the` --gres` value to
-`-gres=gpu:2`. Note that this value is **per node**. For example, -`-gres=gpu:2`. Note that this value is **per node**. For example,
`--nodes=2 --gres=gpu:2 `will request 2 nodes with 2 GPUs each, for a `--nodes=2 --gres=gpu:2 `will request 2 nodes with 2 GPUs each, for a
total of 4 GPUs. total of 4 GPUs.
{{% /notice %}}
The GPU memory feature may be used to specify a GPU RAM amount either The GPU memory feature may be used to specify a GPU RAM amount either
independent of architecture, or in combination with it. independent of architecture, or in combination with it.
For example, using For example, using
{{< highlight batch >}} ```bat
--partition=gpu --gres=gpu --constraint=gpu_16gb --partition=gpu --gres=gpu --constraint=gpu_16gb
{{< /highlight >}} ```
will request a GPU with 16GB of RAM, independent of the type of card will request a GPU with 16GB of RAM, independent of the type of card
(K20, K40, P100, etc.). You may also request both a GPU type _and_ (P100, T4, etc.). You may also request both a GPU type _and_
memory amount using the `&` operator (single quotes are used because memory amount using the `&` operator (single quotes are used because
`&` is a special character). `&` is a special character).
For example, For example,
{{< highlight batch >}} ```bat
--partition=gpu --gres=gpu --constraint='gpu_32gb&gpu_v100' --partition=gpu --gres=gpu --constraint='gpu_32gb&gpu_v100'
{{< /highlight >}} ```
will request a V100 GPU with 32GB RAM. will request a V100 GPU with 32GB RAM.
{{% notice warning %}} !!! warning
You must verify the GPU type and memory combination is valid based on the You must verify the GPU type and memory combination is valid based on the
[available GPU types.]({{< relref "submitting_gpu_jobs/#available-gpus" >}}). [available GPU types.](../submitting_gpu_jobs/#available-gpus).
Requesting a nonexistent combination will cause your job to be rejected with Requesting a nonexistent combination will cause your job to be rejected with
a `Requested node configuration is not available` error. a `Requested node configuration is not available` error.
{{% /notice %}}
### Compiling ### Compiling
Compilation of CUDA or OpenACC jobs must be performed on the GPU nodes. Compilation of CUDA or OpenACC jobs must be performed on the GPU nodes.
Therefore, you must run an [interactive job]({{< relref "creating_an_interactive_job" >}}) Therefore, you must run an [interactive job](../creating_an_interactive_job/)
to compile. An example command to compile in the `gpu` partition could be: to compile. An example command to compile in the `gpu` partition could be:
{{< highlight batch >}} ```bat
$ srun --partition=gpu --gres=gpu --mem=4gb --ntasks-per-node=2 --nodes=1 --pty $SHELL $ srun --partition=gpu --gres=gpu --mem=4gb --ntasks-per-node=2 --nodes=1 --pty $SHELL
{{< /highlight >}} ```
The above command will start a shell on a GPU node with 2 cores and 4GB The above command will start a shell on a GPU node with 2 cores and 4GB
of RAM in order to compile a GPU job. The above command could also be of RAM in order to compile a GPU job. The above command could also be
...@@ -101,9 +102,9 @@ useful if you want to run a test GPU job interactively. ...@@ -101,9 +102,9 @@ useful if you want to run a test GPU job interactively.
CUDA and OpenACC submissions require running on GPU nodes. CUDA and OpenACC submissions require running on GPU nodes.
{{% panel theme="info" header="cuda.submit" %}} !!! note "cuda.submit"
{{< highlight batch >}} ```bat
#!/bin/sh #!/bin/bash
#SBATCH --time=03:15:00 #SBATCH --time=03:15:00
#SBATCH --mem-per-cpu=1024 #SBATCH --mem-per-cpu=1024
#SBATCH --job-name=cuda #SBATCH --job-name=cuda
...@@ -114,16 +115,16 @@ CUDA and OpenACC submissions require running on GPU nodes. ...@@ -114,16 +115,16 @@ CUDA and OpenACC submissions require running on GPU nodes.
module load cuda module load cuda
./cuda-app.exe ./cuda-app.exe
{{< /highlight >}} ```
{{% /panel %}}
OpenACC submissions require loading the PGI compiler (which is currently OpenACC submissions require loading the PGI compiler (which is currently
required to compile as well). required to compile as well).
{{% panel theme="info" header="openacc.submit" %}} !!! note "openacc.submit"
{{< highlight batch >}} ```bat
#!/bin/sh #!/bin/bash
#SBATCH --time=03:15:00 #SBATCH --time=03:15:00
#SBATCH --mem-per-cpu=1024 #SBATCH --mem-per-cpu=1024
#SBATCH --job-name=cuda-acc #SBATCH --job-name=cuda-acc
...@@ -135,8 +136,8 @@ required to compile as well). ...@@ -135,8 +136,8 @@ required to compile as well).
module load cuda/8.0 compiler/pgi/16 module load cuda/8.0 compiler/pgi/16
./acc-app.exe ./acc-app.exe
{{< /highlight >}} ```
{{% /panel %}}
### Submitting Pre-emptable Jobs ### Submitting Pre-emptable Jobs
...@@ -147,9 +148,9 @@ limitation that they may be pre-empted (i.e. killed) at any time**. ...@@ -147,9 +148,9 @@ limitation that they may be pre-empted (i.e. killed) at any time**.
To submit jobs to these resources, add the following to your srun or sbatch command: To submit jobs to these resources, add the following to your srun or sbatch command:
{{< highlight batch >}} ```bat
--partition=guest_gpu --gres=gpu --partition=guest_gpu --gres=gpu
{{< /highlight >}} ```
**In order to properly utilize pre-emptable resources, your job must be able to support **In order to properly utilize pre-emptable resources, your job must be able to support
some type of checkpoint/resume functionality.** some type of checkpoint/resume functionality.**
<footer class=" footline" >
{{ $footer := print "_footer." .Lang }}
{{ range where .Site.Pages "File.BaseFileName" $footer }}
{{ .Content }}
{{else}}
{{ if .Site.GetPage "page" "_footer.md" }}
{{(.Site.GetPage "page" "_footer.md").Content}}
{{else}}
{{ T "create-footer-md" }}
{{end}}
{{end}}
</footer>
<link rel="stylesheet" href="/css/custom.css">
<link href="//cloud.typography.com/7717652/616662/css/fonts.css" type="text/css" rel="stylesheet">
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-36141757-1"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-36141757-1');
</script>
{{ if isset .Params "scripts" }}{{ range .Params.scripts }}<script src="{{ printf "%s" . }}"></script>{{ end }}{{ end }}
{{ if isset .Params "css" }}{{ range .Params.css }}<link rel="stylesheet" href="{{ printf "%s" . }}">{{ end }}{{ end }}
{{$file := .Get "file"}}
{{- if eq (.Get "markdown") "true" -}}
{{- $file | readFile | markdownify -}}
{{- else if (.Get "highlight") -}}
{{- highlight ($file | readFile) (.Get "highlight") "" -}}
{{- else -}}
{{ $file | readFile | safeHTML }}
{{- end -}}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/jquery.tablesorter.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-pager.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/js/widgets/widget-filter.min.js"></script>
<script src="/js/sort-table.js"></script>
<link rel="stylesheet" href="http://mottie.github.io/tablesorter/css/theme.default.css">
<link rel="stylesheet" href="https://mottie.github.io/tablesorter/css/theme.dropbox.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/jquery.tablesorter.pager.min.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.31.1/css/filter.formatter.min.css">
<div class="pager">
<img src="http://mottie.github.com/tablesorter/addons/pager/icons/first.png" class="first"/
>
<img src="http://mottie.github.com/tablesorter/addons/pager/icons/prev.png" class="prev"/>
<!-- the "pagedisplay" can be any element, including an input -->
<span class="pagedisplay" data-pager-output-filtered="{startRow:input} &ndash; {endRow} / {
filteredRows} of {totalRows} total rows"></span>
<img src="http://mottie.github.com/tablesorter/addons/pager/icons/next.png" class="next"/>
<img src="http://mottie.github.com/tablesorter/addons/pager/icons/last.png" class="last"/>
<select class="pagesize">
<option value="5">5</option>
<option value="10">10</option>
<option value="20">20</option>
<option value="30">30</option>
<option value="40">40</option>
<option value="all">All Rows</option>
</select>
<select class="gotoPage" title="Select page number">
<option value="1">1</option>
<option value="2">2</option>
<option value="3">3</option>
<option value="4">4</option>
<option value="5">5</option>
</select>
</div>
{{ .Inner }}
<div class="pager">
<img src="http://mottie.github.com/tablesorter/addons/pager/icons/first.png" class="first"/
>
<img src="http://mottie.github.com/tablesorter/addons/pager/icons/prev.png" class="prev"/>
<!-- the "pagedisplay" can be any element, including an input -->
<span class="pagedisplay" data-pager-output-filtered="{startRow:input} &ndash; {endRow} / {
filteredRows} of {totalRows} total rows"></span>
<img src="http://mottie.github.com/tablesorter/addons/pager/icons/next.png" class="next"/>
<img src="http://mottie.github.com/tablesorter/addons/pager/icons/last.png" class="last"/>
<select class="pagesize">
<option value="5">5</option>
<option value="10">10</option>
<option value="20">20</option>
<option value="30">30</option>
<option value="40">40</option>
<option value="all">All Rows</option>
</select>
<select class="gotoPage" title="Select page number">
<option value="1">1</option>
<option value="2">2</option>
<option value="3">3</option>
<option value="4">4</option>
<option value="5">5</option>
</select>
</div>
<script>
var table = $("table");
table.addClass("sorttable");
</script>
{{ $url := .Get "url" }}
{{ $json := getJSON $url }}
{{ if $json.table_generated }}
<p><em>last generated {{ $json.table_generated }}</em></p>
{{ end }}
<div class="pager">
<img src="http://mottie.github.com/tablesorter/addons/pager/icons/first.png" class="first"/>
<img src="http://mottie.github.com/tablesorter/addons/pager/icons/prev.png" class="prev"/>
<!-- the "pagedisplay" can be any element, including an input -->
<span class="pagedisplay" data-pager-output-filtered="{startRow:input} &ndash; {endRow} / {filteredRows} of {totalRows} total rows"></span>
<img src="http://mottie.github.com/tablesorter/addons/pager/icons/next.png" class="next"/>
<img src="http://mottie.github.com/tablesorter/addons/pager/icons/last.png" class="last"/>
<select class="pagesize">
<option value="5">5</option>
<option value="10">10</option>
<option value="20">20</option>
<option value="30">30</option>
<option value="40">40</option>
<option value="all">All Rows</option>
</select>
<select class="gotoPage" title="Select page number">
<option value="1">1</option>
<option value="2">2</option>
<option value="3">3</option>
<option value="4">4</option>
<option value="5">5</option>
</select>
</div>
<table class="sorttable">
<thead>
<tr>
{{ range $table_header := $json.table_header }}
<th>{{ $table_header }}</th>
{{ end }}
</tr>
</thead>
<tbody>
{{ range $table_row := $json.table_data }}
<tr>
{{ range $table_data := $table_row }}
<td>{{ $table_data }}</td>
{{ end }}
</tr>
{{ end }}
</tbody>
</table>
<div class="pager">
<img src="http://mottie.github.com/tablesorter/addons/pager/icons/first.png" class="first"/>
<img src="http://mottie.github.com/tablesorter/addons/pager/icons/prev.png" class="prev"/>
<!-- the "pagedisplay" can be any element, including an input -->
<span class="pagedisplay" data-pager-output-filtered="{startRow:input} &ndash; {endRow} / {filteredRows} of {totalRows} total rows"></span>
<img src="http://mottie.github.com/tablesorter/addons/pager/icons/next.png" class="next"/>
<img src="http://mottie.github.com/tablesorter/addons/pager/icons/last.png" class="last"/>
<select class="pagesize">
<option value="10">10</option>
<option value="20">20</option>
<option value="30">30</option>
<option value="40">40</option>
<option value="all">All Rows</option>
</select>
<select class="gotoPage" title="Select page number">
<option value="1">1</option>
<option value="2">2</option>
<option value="3">3</option>
<option value="4">4</option>
<option value="5">5</option>
</select>
</div>
import json
import requests
import markdown
import os
import glob
def define_env(env):
@env.macro
def children(path):
# Use: {{ children('path') }}
# Replace path with the directory path after docs/
# For example docs/handling_data would be a value of 'handling_data'
output = """
"""
dir_path = os.getcwd()+'/docs/'+path
for file in sorted(glob.glob(f'{dir_path}/*')):
# Handle sub-directories
if os.path.isdir(file):
file_path = file+'/index.md'
else:
file_path = file
if os.path.exists(file_path) and not file.endswith(f'{dir_path}/index.md'):
with open(file_path, 'r') as f:
title = ""
summary = ""
for line in f.readlines():
if line.strip().startswith('title:'):
title = line.split(':')[1].replace('"','')
if line.strip().startswith('summary:'):
summary = line.split(':')[1].replace('"','')
if summary:
summary = f'- Description: {summary}'
page_title = title.strip(' ').strip('\n')
# Handle sub-directories
if os.path.isdir(file):
url = file_path.split(path+'/')[-1]
else:
url = file_path.split(path+'/')[-1]
if title and not os.path.isdir(file): # If its the index page of a child dir
output += f"""### [{page_title}]({url.replace('.md','/')})
{summary}
"""
else: # If its the index page of a child dir
output += f"""### [{page_title}]({url})
{summary}
"""
### Full Content Return
return output.replace('`','')
@env.macro
def youtube(youtube_url):
# Based on https://github.com/UBCSailbot/sailbot_workspace/pull/374/files#diff-dd05ba889655ed64b86d9ffe222960b781cda2b9ec094f40d4744050eb6c0b2b
youtube_link = youtube_url if 'https' in youtube_url else f'https://www.youtube.com/embed/{youtube_url}'
return f'''<div class="video-wrapper">
<iframe width="560" height="315" src="{youtube_link}" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
</div>'''
@env.macro
def json_table(table_url):
if 'http' in table_url:
table_data = requests.get(table_url).json()
else:
with open(table_url, 'r') as f:
table_data = json.load(f)
#table_data = requests.get(table_url).json()
table_generated = table_data['table_generated']
table_html = '''<p><em>last generated table_generated </em></p>
<div class="pager">
<img src="/images/tablesorter/first.png" class="first"/>
<img src="/images/tablesorter/prev.png" class="prev"/>
<!-- the "pagedisplay" can be any element, including an input -->
<span class="pagedisplay" data-pager-output-filtered="{startRow:input} &ndash; {endRow} / {filteredRows} of {totalRows} total rows"></span>
<img src="/images/tablesorter/next.png" class="next"/>
<img src="/images/tablesorter/last.png" class="last"/>
<select class="pagesize">
<option value="5">5</option>
<option value="10">10</option>
<option value="20">20</option>
<option value="30">30</option>
<option value="40">40</option>
<option value="all">All Rows</option>
</select>
<select class="gotoPage" title="Select page number">
<option value="1">1</option>
<option value="2">2</option>
<option value="3">3</option>
<option value="4">4</option>
<option value="5">5</option>
</select>
</div>
<table class="sorttable">
<thead>
<tr>
'''.replace('table_generated',table_generated)
# Add Headers
for header in table_data['table_header']:
table_html += f'<th>{header}</th>'
table_html += '</tr></thead><tbody>'
# Generate Rows
for row in table_data['table_data']:
table_html += '<tr>'
for entry in row:
table_html += f'<td>{entry}</td>'
table_html += '</tr>'
table_html += '</tbody></table>'
# Add Ending HTML
table_html += '''<div class="pager">
<img src="/images/tablesorter/first.png" class="first"/>
<img src="/images/tablesorter/prev.png" class="prev"/>
<!-- the "pagedisplay" can be any element, including an input -->
<span class="pagedisplay" data-pager-output-filtered="{startRow:input} &ndash; {endRow} / {filteredRows} of {totalRows} total rows"></span>
<img src="/images/tablesorter/next.png" class="next"/>
<img src="/images/tablesorter/last.png" class="last"/>
<select class="pagesize">
<option value="10">10</option>
<option value="20">20</option>
<option value="30">30</option>
<option value="40">40</option>
<option value="all">All Rows</option>
</select>
<select class="gotoPage" title="Select page number">
<option value="1">1</option>
<option value="2">2</option>
<option value="3">3</option>
<option value="4">4</option>
<option value="5">5</option>
</select>
</div>
'''
return table_html
@env.macro
def md_table(table_url):
if 'http' in table_url:
table_data = requests.get(table_url).content.decode("utf-8")
else:
with open(table_url, 'r') as f:
table_data = f.read()
table_html = '''<div class="pager">
<img src="/images/tablesorter/first.png" class="first"/>
<img src="/images/tablesorter/prev.png" class="prev"/>
<!-- the "pagedisplay" can be any element, including an input -->
<span class="pagedisplay" data-pager-output-filtered="{startRow:input} &ndash; {endRow} / {filteredRows} of {totalRows} total rows"></span>
<img src="/images/tablesorter/next.png" class="next"/>
<img src="/images/tablesorter/last.png" class="last"/>
<select class="pagesize">
<option value="5">5</option>
<option value="10">10</option>
<option value="20">20</option>
<option value="30">30</option>
<option value="40">40</option>
<option value="all">All Rows</option>
</select>
<select class="gotoPage" title="Select page number">
<option value="1">1</option>
<option value="2">2</option>
<option value="3">3</option>
<option value="4">4</option>
<option value="5">5</option>
</select>
</div>
'''
table_html += markdown.markdown(table_data, extensions=['markdown.extensions.tables']).replace('<table>','<table class="sorttable">')
# Add Ending HTML
table_html += '''<div class="pager">
<img src="/images/tablesorter/first.png" class="first"/>
<img src="/images/tablesorter/prev.png" class="prev"/>
<!-- the "pagedisplay" can be any element, including an input -->
<span class="pagedisplay" data-pager-output-filtered="{startRow:input} &ndash; {endRow} / {filteredRows} of {totalRows} total rows"></span>
<img src="/images/tablesorter/next.png" class="next"/>
<img src="/images/tablesorter/last.png" class="last"/>
<select class="pagesize">
<option value="10">10</option>
<option value="20">20</option>
<option value="30">30</option>
<option value="40">40</option>
<option value="all">All Rows</option>
</select>
<select class="gotoPage" title="Select page number">
<option value="1">1</option>
<option value="2">2</option>
<option value="3">3</option>
<option value="4">4</option>
<option value="5">5</option>
</select>
</div>
'''
return table_html
\ No newline at end of file