Skip to content
Snippets Groups Projects

Add Good HCC Practices page

Closed Natasha Pavlovikj requested to merge practices into master

Add Good HCC Practices page

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
1 +++
2 title = "Good HCC Practices"
3 description = "Guidelines for good HCC practices"
4 weight = "95"
5 +++
6
7 Crane and Rhino, our two high-performance clusters, are shared among all our users.
8 Sometimes, some users' activities may negatively impact the clusters and the users.
9 To avoid this, we provide the following guidelines for good HCC practices.
10
11 ## Login Node
12 * **Do not run jobs on the login node.** The login node is shared among all users and it
  • Carrie A Brown
    Carrie A Brown @cbrown58 started a thread on commit 8963e94a
  • 4 weight = "95"
    5 +++
    6
    7 Crane and Rhino, our two high-performance clusters, are shared among all our users.
    8 Sometimes, some users' activities may negatively impact the clusters and the users.
    9 To avoid this, we provide the following guidelines for good HCC practices.
    10
    11 ## Login Node
    12 * **Do not run jobs on the login node.** The login node is shared among all users and it
    13 should be used only for light tasks, such as moving and editing files, compiling programs,
    14 and submitting and monitoring jobs. If a researcher runs a computationally intensive task
    15 on the login node, that will negatively impact the performance for other users. For any CPU
    16 or memory intensive operations, such as testing and running applications, one should use an
    17 [interactive session]({{< relref "creating_an_interactive_job" >}}), or
    18 [submit a job to the batch queue]({{< relref "submitting_jobs" >}}).
    19 * **Do not launch multiple simultaneous processes on the login node.** This may include using
  • Carrie A Brown
    Carrie A Brown @cbrown58 started a thread on commit 8963e94a
  • 5 +++
    6
    7 Crane and Rhino, our two high-performance clusters, are shared among all our users.
    8 Sometimes, some users' activities may negatively impact the clusters and the users.
    9 To avoid this, we provide the following guidelines for good HCC practices.
    10
    11 ## Login Node
    12 * **Do not run jobs on the login node.** The login node is shared among all users and it
    13 should be used only for light tasks, such as moving and editing files, compiling programs,
    14 and submitting and monitoring jobs. If a researcher runs a computationally intensive task
    15 on the login node, that will negatively impact the performance for other users. For any CPU
    16 or memory intensive operations, such as testing and running applications, one should use an
    17 [interactive session]({{< relref "creating_an_interactive_job" >}}), or
    18 [submit a job to the batch queue]({{< relref "submitting_jobs" >}}).
    19 * **Do not launch multiple simultaneous processes on the login node.** This may include using
    20 lots of threads for compiling applications, or checking the job status multiple times a minute.
  • Carrie A Brown
    Carrie A Brown @cbrown58 started a thread on commit 8963e94a
  • 13 should be used only for light tasks, such as moving and editing files, compiling programs,
    14 and submitting and monitoring jobs. If a researcher runs a computationally intensive task
    15 on the login node, that will negatively impact the performance for other users. For any CPU
    16 or memory intensive operations, such as testing and running applications, one should use an
    17 [interactive session]({{< relref "creating_an_interactive_job" >}}), or
    18 [submit a job to the batch queue]({{< relref "submitting_jobs" >}}).
    19 * **Do not launch multiple simultaneous processes on the login node.** This may include using
    20 lots of threads for compiling applications, or checking the job status multiple times a minute.
    21
    22 ## File Systems
    23 * Some I/O intensive jobs may benefit from **copying the data to the fast, temporary /scratch
    24 file system local to each worker nodes**. The */scratch* directories are unique per job, and
    25 are deleted when the job finishes. Thus, the last step of the batch script should copy the
    26 needed output files from */scratch* to either */work* or */common*. Please see the
    27 [Running BLAST Alignment]({{< relref "running_blast_alignment" >}}) page for an example.
    28 Currently, we do not have quota on the */scratch* file system.
  • Carrie A Brown
    Carrie A Brown @cbrown58 started a thread on commit 8963e94a
  • 25 are deleted when the job finishes. Thus, the last step of the batch script should copy the
    26 needed output files from */scratch* to either */work* or */common*. Please see the
    27 [Running BLAST Alignment]({{< relref "running_blast_alignment" >}}) page for an example.
    28 Currently, we do not have quota on the */scratch* file system.
    29 * */work* has two quotas - one for **file count** and the second one for **disk space**.
    30 Reaching these quotas may additionally stress the file system. Therefore, please make sure you
    31 monitor these quotas regularly, and delete all the files that are not needed or copy them to more permanent location.
    32 * */work* is intended to be **temporary location for storing job outputs and files**. After that,
    33 all the necessary files need to be either moved to a permanent storage, or deleted.
    34 * **Avoid rapidly opening and closing many files, as well as frequently reading and writing to
    35 disk, in your program.** This approach stresses the file system and may cause general issues.
    36 Instead, consider reading and writing large blocks of data in memory over time, or
    37 utilizing more advanced parallel I/O libraries, such as *parallel hdf5* and *parallel netcdf*.
    38
    39 ## Internal and External Networks
    40 * **Transferring many files to/from/within the cluster can harm the file system.** If you are
  • added 1 commit

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • Please register or sign in to reply
    Loading