diff --git a/content/quickstarts/using_nus_gitlab_instance/_index.md b/content/guides/handling_data/using_nus_gitlab_instance/_index.md similarity index 100% rename from content/quickstarts/using_nus_gitlab_instance/_index.md rename to content/guides/handling_data/using_nus_gitlab_instance/_index.md diff --git a/content/quickstarts/using_nus_gitlab_instance/setting_up_gitlab_on_hcc_clusters.md b/content/guides/handling_data/using_nus_gitlab_instance/setting_up_gitlab_on_hcc_clusters.md similarity index 100% rename from content/quickstarts/using_nus_gitlab_instance/setting_up_gitlab_on_hcc_clusters.md rename to content/guides/handling_data/using_nus_gitlab_instance/setting_up_gitlab_on_hcc_clusters.md diff --git a/content/quickstarts/fortran_c_on_hcc.md b/content/guides/running_applications/fortran_c_on_hcc.md similarity index 96% rename from content/quickstarts/fortran_c_on_hcc.md rename to content/guides/running_applications/fortran_c_on_hcc.md index d281768202c72ee9fad5fb30a1fa2180bb6fd44e..8a5ed998da1611224a75dece47144e7c3c7b2f38 100644 --- a/content/quickstarts/fortran_c_on_hcc.md +++ b/content/guides/running_applications/fortran_c_on_hcc.md @@ -10,8 +10,8 @@ downloaded from [serial_dir.zip](/attachments/serial_dir.zip). #### Login to a HCC Cluster (Tusker or Crane) -Log in to a HCC cluster through PuTTY ([For Windows Users]({{< relref "/quickstarts/for_windows_users">}})) or Terminal ([For Mac/Linux -Users]({{< relref "/quickstarts/for_maclinux_users">}})) and make a subdirectory called `serial_dir` under the `$WORK` directory. +Log in to a HCC cluster through PuTTY ([For Windows Users]({{< relref "/quickstarts/connecting/for_windows_users">}})) or Terminal ([For Mac/Linux +Users]({{< relref "/quickstarts/connecting/for_maclinux_users">}})) and make a subdirectory called `serial_dir` under the `$WORK` directory. {{< highlight bash >}} $ cd $WORK diff --git a/content/quickstarts/how_to_setup_x11_forwarding.md b/content/guides/running_applications/how_to_setup_x11_forwarding.md similarity index 100% rename from content/quickstarts/how_to_setup_x11_forwarding.md rename to content/guides/running_applications/how_to_setup_x11_forwarding.md diff --git a/content/quickstarts/mpi_jobs_on_hcc.md b/content/guides/running_applications/mpi_jobs_on_hcc.md similarity index 97% rename from content/quickstarts/mpi_jobs_on_hcc.md rename to content/guides/running_applications/mpi_jobs_on_hcc.md index 7bc5c9eccf4c3960d2edfbfa1d907c13dd930799..b2aaedbd2549fea480f99ed280e5fb4cbb6a4772 100644 --- a/content/quickstarts/mpi_jobs_on_hcc.md +++ b/content/guides/running_applications/mpi_jobs_on_hcc.md @@ -10,8 +10,8 @@ scripts can be downloaded from [mpi_dir.zip](/attachments/mpi_dir.zip). #### Login to a HCC Cluster -Log in to a HCC cluster through PuTTY ([For Windows Users]({{< relref "/quickstarts/for_windows_users">}})) or Terminal ([For Mac/Linux -Users]({{< relref "/quickstarts/for_maclinux_users">}})) and make a subdirectory called `mpi_dir` under the `$WORK` directory. +Log in to a HCC cluster through PuTTY ([For Windows Users]({{< relref "/quickstarts/connecting/for_windows_users">}})) or Terminal ([For Mac/Linux +Users]({{< relref "/quickstarts/connecting/for_maclinux_users">}})) and make a subdirectory called `mpi_dir` under the `$WORK` directory. {{< highlight bash >}} $ cd $WORK diff --git a/content/guides/running_applications/using_anaconda_package_manager.md b/content/guides/running_applications/using_anaconda_package_manager.md index d5f94f51154b130029b4aefc5daf7b9a2933b783..87d45ed5aa5e6b058117e0914a8668ba4c02150b 100644 --- a/content/guides/running_applications/using_anaconda_package_manager.md +++ b/content/guides/running_applications/using_anaconda_package_manager.md @@ -226,7 +226,7 @@ Jupyter Notebook. To do so, follow the steps below, replacing ln -s $WORK/.jupyter/kernels ~/.local/share/jupyter/kernels {{< /highlight >}} -{{% notice note %}} + {{% notice note %}} **Note**: Step 5 only needs to be done once. Any future created environments will automatically be accessible from SLURM notebooks once this is done. diff --git a/content/quickstarts/condor_jobs_on_hcc.md b/content/guides/submitting_jobs/condor_jobs_on_hcc.md similarity index 91% rename from content/quickstarts/condor_jobs_on_hcc.md rename to content/guides/submitting_jobs/condor_jobs_on_hcc.md index 81150c1d0ef7ac146b2c44d180ac874cee4e311d..4a2ab4ea601d19fd3fc84b4861002914126f937d 100644 --- a/content/quickstarts/condor_jobs_on_hcc.md +++ b/content/guides/submitting_jobs/condor_jobs_on_hcc.md @@ -10,7 +10,7 @@ can be downloaded from [condor_dir.zip](/attachments/3178558.zip). #### Login to a HCC Cluster -Log in to a HCC cluster through PuTTY ([For Windows Users]({{< relref "/quickstarts/for_windows_users">}})) or Terminal ([For Mac/Linux Users]({{< relref "/quickstarts/for_maclinux_users">}})) and make a subdirectory called `condor_dir` under the `$WORK` directory. In the subdirectory `condor_dir`, create job subdirectories that host the input data files. Here we create two job subdirectories, `job_0` and `job_1`, and put a data file (`data.dat`) in each subdirectory. The data file in `job_0` has a column of data listing the integers from 1 to 5. The data file in `job_1` has a integer list from 6 to 10. +Log in to a HCC cluster through PuTTY ([For Windows Users]({{< relref "/quickstarts/connecting/for_windows_users">}})) or Terminal ([For Mac/Linux Users]({{< relref "/quickstarts/connecting/for_maclinux_users">}})) and make a subdirectory called `condor_dir` under the `$WORK` directory. In the subdirectory `condor_dir`, create job subdirectories that host the input data files. Here we create two job subdirectories, `job_0` and `job_1`, and put a data file (`data.dat`) in each subdirectory. The data file in `job_0` has a column of data listing the integers from 1 to 5. The data file in `job_1` has a integer list from 6 to 10. {{< highlight bash >}} $ cd $WORK diff --git a/content/quickstarts/connecting/_index.md b/content/quickstarts/connecting/_index.md new file mode 100644 index 0000000000000000000000000000000000000000..b0149d21ca0095c145057e8f14414d104e7132af --- /dev/null +++ b/content/quickstarts/connecting/_index.md @@ -0,0 +1,20 @@ ++++ +title = "How to Connect" +description = "What is a cluster and what is HPC" +weight = "9" ++++ +High-Performance Computing is the use of groups of computers to solve computations a user or group would not be able to solve in a reasonable time-frame on their own desktop or laptop. This is often achieved by splitting one large job amongst numerous cores or 'workers'. This is similar to how a skyscraper is built by numerous individuals rather than a single person. Many fields take advantage of HPC including bioinformatics, chemistry, materials engineering, and newer fields such as educational psychology and philosophy. +{{< figure src="/images/cluster.png" height="450" >}} +HPC clusters consist of four primary parts, the login node, management node, workers, and a central storage array. All of these parts are bound together with a scheduler such as HTCondor or SLURM. +</br></br> +#### Login Node: +Users will automatically land on the login node when they log in to the clusters. You will [submit jobs] ({{< ref "/guides/submitting_jobs" >}}) using one of the schedulers and pull the results of your jobs. Running jobs on the login node directly will be stopped so others can use the login node to submit jobs. +</br></br> +#### Management Node: +The management node does as it sounds, it manages the cluster and provides a central point to manage the rest of the systems. +</br></br> +#### Worker Nodes: +The worker nodes are what run and process your jobs that are submitted from the schedulers. Through the use of the schedulers, more work can be efficiently done by squeezing in all jobs possible for the resources requested throughout the nodes. They also allow for fair use computing by making sure one user or group is not using the entire cluster at once and allowing others to use the clusters. +</br></br> +#### Central Storage Array: +The central storage array allows all of the nodes within the cluster to have access to the same files without needing to transfer them around. HCC has three arrays mounted on the clusters with more details [here]({{< ref "/guides/handling_data" >}}). diff --git a/content/quickstarts/basic_linux_commands.md b/content/quickstarts/connecting/basic_linux_commands.md similarity index 100% rename from content/quickstarts/basic_linux_commands.md rename to content/quickstarts/connecting/basic_linux_commands.md diff --git a/content/quickstarts/for_maclinux_users.md b/content/quickstarts/connecting/for_maclinux_users.md similarity index 100% rename from content/quickstarts/for_maclinux_users.md rename to content/quickstarts/connecting/for_maclinux_users.md diff --git a/content/quickstarts/for_windows_users.md b/content/quickstarts/connecting/for_windows_users.md similarity index 100% rename from content/quickstarts/for_windows_users.md rename to content/quickstarts/connecting/for_windows_users.md diff --git a/content/quickstarts/how_to_change_your_password.md b/content/quickstarts/connecting/how_to_change_your_password.md similarity index 100% rename from content/quickstarts/how_to_change_your_password.md rename to content/quickstarts/connecting/how_to_change_your_password.md diff --git a/content/quickstarts/running_applications.md b/content/quickstarts/running_applications.md new file mode 100644 index 0000000000000000000000000000000000000000..2067abb76a6cc4d378847c304661ee918fcfde7a --- /dev/null +++ b/content/quickstarts/running_applications.md @@ -0,0 +1,33 @@ ++++ +title = "Running Applications" +description = "How to run various applications on HCC resources." +weight = "20" ++++ + + + +# Using Installed Software + +HCC Clusters use the Lmod module system to manage applications. You can search, view and load installed software with the `module` command. + +## Available Software + +## Using Modules + +### Searching Available Modules + +### Loading Modules + +### Unloading Modules + +# Installing Software + +## Compiling from Source Code + +## Using Anaconda + +## Request Installation + + + +{{% children %}} diff --git a/content/quickstarts/setting_up_and_using_duo.md b/content/quickstarts/setting_up_and_using_duo.md index 72d79d1dcf79a52e761531bc522378415dee223d..04f0847dc6979387f136f44fe301904ad152c419 100644 --- a/content/quickstarts/setting_up_and_using_duo.md +++ b/content/quickstarts/setting_up_and_using_duo.md @@ -1,7 +1,7 @@ +++ title = "Setting Up and Using Duo" description = "Duo Setup Instructions" -weight = "10" +weight = "8" +++ ##### Use of Duo two-factor authentication (https://www.duosecurity.com) is required for access to HCC resources. diff --git a/content/quickstarts/submitting_jobs.md b/content/quickstarts/submitting_jobs.md new file mode 100644 index 0000000000000000000000000000000000000000..e8ed451b3ab18d8f58b653dd6f0612a6caa5089b --- /dev/null +++ b/content/quickstarts/submitting_jobs.md @@ -0,0 +1,198 @@ ++++ +title = "Submitting Jobs" +description = "How to submit jobs to HCC resources" +weight = "10" ++++ + +Crane and Tusker are managed by +the [SLURM](https://slurm.schedmd.com) resource manager. +In order to run processing on Crane or Tusker, you +must create a SLURM script that will run your processing. After +submitting the job, SLURM will schedule your processing on an available +worker node. + +Before writing a submit file, you may need to +[compile your application]({{< relref "/guides/running_applications/compiling_source_code" >}}). + +- [Ensure proper working directory for job output](#ensure-proper-working-directory-for-job-output) +- [Creating a SLURM Submit File](#creating-a-slurm-submit-file) +- [Submitting the job](#submitting-the-job) +- [Checking Job Status](#checking-job-status) + - [Checking Job Start](#checking-job-start) +- [Next Steps](#next-steps) + + +### Ensure proper working directory for job output + +{{% notice info %}} +Because the /home directories are not writable from the worker nodes, all SLURM job output should be directed to your /work path. +{{% /notice %}} + +{{% panel theme="info" header="Manual specification of /work path" %}} +{{< highlight bash >}} +$ cd /work/[groupname]/[username] +{{< /highlight >}} +{{% /panel %}} + +The environment variable `$WORK` can also be used. +{{% panel theme="info" header="Using environment variable for /work path" %}} +{{< highlight bash >}} +$ cd $WORK +$ pwd +/work/[groupname]/[username] +{{< /highlight >}} +{{% /panel %}} + +Review how /work differs from /home [here.]({{< relref "/guides/handling_data/_index.md" >}}) + +### Creating a SLURM Submit File + +{{% notice info %}} +The below example is for a serial job. For submitting MPI jobs, please +look at the [MPI Submission Guide.]({{< relref "submitting_an_mpi_job" >}}) +{{% /notice %}} + +A SLURM submit file is broken into 2 sections, the job description and +the processing. SLURM job description are prepended with `#SBATCH` in +the submit file. + +**SLURM Submit File** + +{{< highlight batch >}} +#!/bin/sh +#SBATCH --time=03:15:00 # Run time in hh:mm:ss +#SBATCH --mem-per-cpu=1024 # Maximum memory required per CPU (in megabytes) +#SBATCH --job-name=hello-world +#SBATCH --error=/work/[groupname]/[username]/job.%J.err +#SBATCH --output=/work/[groupname]/[username]/job.%J.out + +module load example/test + +hostname +sleep 60 +{{< /highlight >}} + +- **time** + Maximum walltime the job can run. After this time has expired, the + job will be stopped. +- **mem-per-cpu** + Memory that is allocated per core for the job. If you exceed this + memory limit, your job will be stopped. +- **mem** + Specify the real memory required per node in MegaBytes. If you + exceed this limit, your job will be stopped. Note that for you + should ask for less memory than each node actually has. For + instance, Tusker has 1TB, 512GB and 256GB of RAM per node. You may + only request 1000GB of RAM for the 1TB node, 500GB of RAM for the + 512GB nodes, and 250GB of RAM for the 256GB nodes. For Crane, the + max is 500GB. +- **job-name** + The name of the job. Will be reported in the job listing. +- **partition** + The partition the job should run in. Partitions determine the job's + priority and on what nodes the partition can run on. See the + [Partitions]({{< relref "/guides/submitting_jobs/partitions/_index.md" >}}) page for a list of possible partitions. +- **error** + Location of the stderr will be written for the job. `[groupname]` + and `[username]` should be replaced your group name and username. + Your username can be retrieved with the command `id -un` and your + group with `id -ng`. +- **output** + Location of the stdout will be written for the job. + +More advanced submit commands can be found on the [SLURM Docs](https://slurm.schedmd.com/sbatch.html). +You can also find an example of a MPI submission on [Submitting an MPI Job]({{< relref "submitting_an_mpi_job" >}}). + +### Submitting the job + +Submitting the SLURM job is done by command `sbatch`. SLURM will read +the submit file, and schedule the job according to the description in +the submit file. + +Submitting the job described above is: + +{{% panel theme="info" header="SLURM Submission" %}} +{{< highlight batch >}} +$ sbatch example.slurm +Submitted batch job 24603 +{{< /highlight >}} +{{% /panel %}} + +The job was successfully submitted. + +### Checking Job Status + +Job status is found with the command `squeue`. It will provide +information such as: + +- The State of the job: + - **R** - Running + - **PD** - Pending - Job is awaiting resource allocation. + - Additional codes are available + on the [squeue](http://slurm.schedmd.com/squeue.html) + page. +- Job Name +- Run Time +- Nodes running the job + +Checking the status of the job is easiest by filtering by your username, +using the `-u` option to squeue. + +{{< highlight batch >}} +$ squeue -u <username> + JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) + 24605 batch hello-wo <username> R 0:56 1 b01 +{{< /highlight >}} + +Additionally, if you want to see the status of a specific partition, for +example if you are part of a [partition]({{< relref "/guides/submitting_jobs/partitions/_index.md" >}}), +you can use the `-p` option to `squeue`: + +{{< highlight batch >}} +$ squeue -p esquared + JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) + 73435 esquared MyRandom tingting R 10:35:20 1 ri19n10 + 73436 esquared MyRandom tingting R 10:35:20 1 ri19n12 + 73735 esquared SW2_driv hroehr R 10:14:11 1 ri20n07 + 73736 esquared SW2_driv hroehr R 10:14:11 1 ri20n07 +{{< /highlight >}} + +#### Checking Job Start + +You may view the start time of your job with the +command `squeue --start`. The output of the command will show the +expected start time of the jobs. + +{{< highlight batch >}} +$ squeue --start --user lypeng + JOBID PARTITION NAME USER ST START_TIME NODES NODELIST(REASON) + 5822 batch Starace lypeng PD 2013-06-08T00:05:09 3 (Priority) + 5823 batch Starace lypeng PD 2013-06-08T00:07:39 3 (Priority) + 5824 batch Starace lypeng PD 2013-06-08T00:09:09 3 (Priority) + 5825 batch Starace lypeng PD 2013-06-08T00:12:09 3 (Priority) + 5826 batch Starace lypeng PD 2013-06-08T00:12:39 3 (Priority) + 5827 batch Starace lypeng PD 2013-06-08T00:12:39 3 (Priority) + 5828 batch Starace lypeng PD 2013-06-08T00:12:39 3 (Priority) + 5829 batch Starace lypeng PD 2013-06-08T00:13:09 3 (Priority) + 5830 batch Starace lypeng PD 2013-06-08T00:13:09 3 (Priority) + 5831 batch Starace lypeng PD 2013-06-08T00:14:09 3 (Priority) + 5832 batch Starace lypeng PD N/A 3 (Priority) +{{< /highlight >}} + +The output shows the expected start time of the jobs, as well as the +reason that the jobs are currently idle (in this case, low priority of +the user due to running numerous jobs already). + +#### Removing the Job + +Removing the job is done with the `scancel` command. The only argument +to the `scancel` command is the job id. For the job above, the command +is: + +{{< highlight batch >}} +$ scancel 24605 +{{< /highlight >}} + +### Next Steps + +{{% children %}} diff --git a/static/images/cluster.png b/static/images/cluster.png new file mode 100644 index 0000000000000000000000000000000000000000..844810e0d86edc1366f738db05e0087e506407d6 Binary files /dev/null and b/static/images/cluster.png differ