Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
H
HCC docs
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Deploy
Releases
Monitor
Service Desk
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Holland Computing Center
HCC docs
Commits
4262080d
Commit
4262080d
authored
5 years ago
by
Natasha Pavlovikj
Committed by
John Thiltges
5 years ago
Browse files
Options
Downloads
Patches
Plain Diff
Add info for mem_report
parent
f96482d5
Branches
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
content/submitting_jobs/monitoring_jobs.md
+60
-23
60 additions, 23 deletions
content/submitting_jobs/monitoring_jobs.md
with
60 additions
and
23 deletions
content/submitting_jobs/monitoring_jobs.md
+
60
−
23
View file @
4262080d
...
...
@@ -38,12 +38,12 @@ Additional arguments and format field information can be found in
[
the SLURM documentation
](
https://slurm.schedmd.com/sacct.html
)
.
### Monitoring Running Jobs:
There are two ways to monitor running jobs, the top command and
monitoring the cgroup files
. Top is helpful when monitoring
multi-process jobs, whereas the cgroup files provide information on
memory usage. Both
o
f
the
se tools require the use of an interactiv
e job
on the same node as the job to be monitored
.
There are two ways to monitor running jobs, the
`top`
command and
monitoring the
`cgroup`
files using the utility
`cgget`
.
`top`
is helpful
when
monitoring
multi-process jobs, whereas
the
`
cgroup
`
files
provide
information on memory usage. Both of these tools require the use of an
interactive job
o
n
the
same node as the job to be monitored while th
e job
is running
.
{{% notice warning %}}
If the job to be monitored is using all available resources for a node,
...
...
@@ -57,7 +57,7 @@ an interactive job on the same node using the srun command:
srun --jobid=
<JOB_ID>
--pty bash
{{
<
/
highlight
>
}}
W
here
`<JOB_ID>`
is replaced by the job id for the monitored job as
w
here
`<JOB_ID>`
is replaced by the job id for the monitored job as
assigned by SLURM.
Alternately, you can request the interactive job by nodename as follows:
...
...
@@ -66,46 +66,83 @@ Alternately, you can request the interactive job by nodename as follows:
srun --nodelist=
<NODE_ID>
--pty bash
{{
<
/
highlight
>
}}
W
here
`<NODE_ID>`
is replaced by the n
ode name that
the monitored
w
here
`<NODE_ID>`
is replaced by the n
ame of the node where
the monitored
job is running. This information can be found out by looking at the
squeue output under the
`NODELIST`
column.
{{
<
figure
src=
"/images/21070055.png"
width=
"700"
>
}}
Once the interactive job begins, you can run top to view the processes
### Using `top` to monitor running jobs
Once the interactive job begins, you can run
`top`
to view the processes
on the node you are on:
{{
<
figure
src=
"/images/21070056.png"
height=
"400"
>
}}
Output for top displays each running process on the node. From the above
Output for
`
top
`
displays each running process on the node. From the above
image, we can see the various MATLAB processes being run by user
cathrine98. To filter the list of processes, you can type
`u`
followed
by the username of the user who owns the processes. To exit this screen,
press
`q`
.
During a running job, the cgroup folder is created which contains much
of the information used by sacct. These files can provide a live
overview of resources used for a running job. To access the cgroup
files, you will need to be in an interactive job on the same node as the
monitored job. To view specific files, and information, use one of the
following commands:
### Using `cgget` to monitor running jobs
During a running job, the
`cgroup`
folder is created on the node where the job
is running. This folder contains much of the information used by
`sacct`
.
However, while
`sacct`
reports information gathered every 30 seconds, the
`cgroup`
files are updated more frequently and can detect quick spikes in
resource usage missed by
`sacct`
. Thus, using the
`cgroup`
files can give more
accurate information, especially regarding the RAM usage.
##### To view current memory usage:
One way to access the
`cgroup`
files with
`cgget`
, is to start an interactive job
on the same node as the monitored job. Then, to view specific files and information,
use one of the following commands:
##### To view current memory usage:
{{
<
highlight
bash
>
}}
less /cgroup/memory
/slurm/uid_
<UID>
/job_
<SLURM_JOB
_
ID>
/
memory.usage_in_bytes
cgget -r memory.usage_in_bytes
/slurm/uid_
<UID>
/job_
<SLURM_JOBID>
/
{{
<
/
highlight
>
}}
W
here
`<UID>`
is replaced by your UID and
`<SLURM_JOB
_
ID>`
is
replaced by the monitored job's Job ID as assigned by S
lurm
.
w
here
`<UID>`
is replaced by your UID and
`<SLURM_JOBID>`
is
replaced by the monitored job's Job ID as assigned by S
LURM
.
{{% notice note %}}
To find your uid, use the command
`id -u`
. Your UID never changes and is
To find your
`
uid
`
, use the command
`id -u`
. Your UID never changes and is
the same on all HCC clusters (
*not*
on Anvil, however!).
{{% /notice %}}
##### To view maximum memory usage from start of job to current point:
##### To view the total CPU time, in nanoseconds, consummed by the job:
{{
<
highlight
bash
>
}}
cgget -r cpuacct.usage /slurm/uid_
<UID>
/job_
<SLURM_JOBID>
/
{{
<
/
highlight
>
}}
Since the
`cgroup`
files are available only during the job is running, another
way of accessing the information from these files is through the submit job.
To track for example, the maximum memory usage of a job, you can add
{{
<
highlight
bash
>
}}
c
at /cgroup/memory
/slurm/uid_${UID}/job_${SLURM_JOBID}/
memory.max_usage_in_bytes
c
gget -r memory.max_usage_in_bytes
/slurm/uid_${UID}/job_${SLURM_JOBID}/
{{
<
/
highlight
>
}}
at the end of your submit file. Unlike the previous examples, you do not need to
modify this command - here
`UID`
and
`SLURM_JOBID`
are variables that will be set
when the job is submitted.
For information on more variables that can be used with
`cgget`
, please check
[
here
](
https://reposcope.com/man/en/1/cgget
)
.
We also provide a sciprt,
`mem_report`
, that reports the current and maximum
memory usages for a job. This script is wrapper for the
`cgget`
commands shown above
and generates user-friendly output. To use this script, you need to add
```
mem_report
```
at the end of your submit script.
`mem_report`
can also be run as part of an interactive job:
{{
<
highlight
bash
>
}}
[demo13@c0218.crane ~]$ mem_report
Current memory usage for job 25745709 is: 2.57 MBs
Maximum memory usage for job 25745709 is: 3.27 MBs
{{
<
/
highlight
>
}}
When
`cgget`
and
`mem_report`
are used as part of the submit script, the respective output
is printed in the generated SLURM log files, unless otherwise specified.
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment