-[How many nodes/memory/time should I request?](#how-many-nodes-memory-time-should-i-request)
-[I am trying to run a job but nothing happens?](#i-am-trying-to-run-a-job-but-nothing-happens)
-[I keep getting the error "slurmstepd: error: Exceeded step memory limit at some point." What does this mean and how do I fix it?](#i-keep-getting-the-error-slurmstepd-error-exceeded-step-memory-limit-at-some-point-what-does-this-mean-and-how-do-i-fix-it)
-[I keep getting the error "Some of your processes may have been killed by the cgroup out-of-memory handler." What does this mean and how do I fix it?](#i-keep-getting-the-error-some-of-your-processes-may-have-been-killed-by-the-cgroup-out-of-memory-handler-what-does-this-mean-and-how-do-i-fix-it)
-[I keep getting the error "Job cancelled due to time limit." What does this mean and how do I fix it?](#i-keep-getting-the-error-job-cancelled-due-to-time-limit-what-does-this-mean-and-how-do-i-fix-it)
-[I want to talk to a human about my problem. Can I do that?](#i-want-to-talk-to-a-human-about-my-problem-can-i-do-that)
-[My submitted job takes long time waiting in the queue or it is not running?](#my-submitted-job-takes-long-time-waiting-in-the-queue-or-it-is-not-running)
-[What IP's do I use to allow connections to/from HCC resources?](#what-ip-s-do-i-use-to-allow-connections-to-from-hcc-resources)
...
...
@@ -136,7 +138,7 @@ with your login, the name of the cluster you are running on, and the
full path to your submit script and we will be happy to help solve the
issue.
##### I keep getting the error "slurmstepd: error: Exceeded step memory limit at some point." What does this mean and how do I fix it?
#### I keep getting the error "slurmstepd: error: Exceeded step memory limit at some point." What does this mean and how do I fix it?
This error occurs when the job you are running uses more memory than was
requested in your submit script.
...
...
@@ -162,6 +164,57 @@ If you continue to run into issues, please contact us at
#### I keep getting the error "Some of your processes may have been killed by the cgroup out-of-memory handler." What does this mean and how do I fix it?
This is another error that occurs when the job you are running uses more memory than was
requested in your submit script.
If you specified `--mem` or `--mem-per-cpu` in your submit script, try
increasing this value and resubmitting your job.
If you did not specify `--mem` or `--mem-per-cpu` in your submit script,
chances are the default amount allotted is not sufficient. Add the line
{{<highlightbatch>}}
#SBATCH --mem=<memory_amount>
{{</highlight>}}
to your script with a reasonable amount of memory and try running it again. If you keep
getting this error, continue to increase the requested memory amount and
resubmit the job until it finishes successfully.
For additional details on how to monitor usage on jobs, check out the
documentation on [Monitoring Jobs]({{<relref"monitoring_jobs">}}).
If you continue to run into issues, please contact us at