Skip to content
Snippets Groups Projects
Commit 23ff55c9 authored by Natasha Pavlovikj's avatar Natasha Pavlovikj
Browse files

Update docs

parent 78540cea
Branches
No related tags found
1 merge request!369Update docs
......@@ -12,6 +12,8 @@ weight = "95"
- [How many nodes/memory/time should I request?](#how-many-nodes-memory-time-should-i-request)
- [I am trying to run a job but nothing happens?](#i-am-trying-to-run-a-job-but-nothing-happens)
- [I keep getting the error "slurmstepd: error: Exceeded step memory limit at some point." What does this mean and how do I fix it?](#i-keep-getting-the-error-slurmstepd-error-exceeded-step-memory-limit-at-some-point-what-does-this-mean-and-how-do-i-fix-it)
- [I keep getting the error "Some of your processes may have been killed by the cgroup out-of-memory handler." What does this mean and how do I fix it?](#i-keep-getting-the-error-some-of-your-processes-may-have-been-killed-by-the-cgroup-out-of-memory-handler-what-does-this-mean-and-how-do-i-fix-it)
- [I keep getting the error "Job cancelled due to time limit." What does this mean and how do I fix it?](#i-keep-getting-the-error-job-cancelled-due-to-time-limit-what-does-this-mean-and-how-do-i-fix-it)
- [I want to talk to a human about my problem. Can I do that?](#i-want-to-talk-to-a-human-about-my-problem-can-i-do-that)
- [My submitted job takes long time waiting in the queue or it is not running?](#my-submitted-job-takes-long-time-waiting-in-the-queue-or-it-is-not-running)
- [What IP's do I use to allow connections to/from HCC resources?](#what-ip-s-do-i-use-to-allow-connections-to-from-hcc-resources)
......@@ -136,7 +138,7 @@ with your login, the name of the cluster you are running on, and the
full path to your submit script and we will be happy to help solve the
issue.
##### I keep getting the error "slurmstepd: error: Exceeded step memory limit at some point." What does this mean and how do I fix it?
#### I keep getting the error "slurmstepd: error: Exceeded step memory limit at some point." What does this mean and how do I fix it?
This error occurs when the job you are running uses more memory than was
requested in your submit script.
......@@ -162,6 +164,57 @@ If you continue to run into issues, please contact us at
{{< icon name="envelope" >}}[hcc-support@unl.edu](mailto:hcc-support@unl.edu)
for additional assistance.
#### I keep getting the error "Some of your processes may have been killed by the cgroup out-of-memory handler." What does this mean and how do I fix it?
This is another error that occurs when the job you are running uses more memory than was
requested in your submit script.
If you specified `--mem` or `--mem-per-cpu` in your submit script, try
increasing this value and resubmitting your job.
If you did not specify `--mem` or `--mem-per-cpu` in your submit script,
chances are the default amount allotted is not sufficient. Add the line
{{< highlight batch >}}
#SBATCH --mem=<memory_amount>
{{< /highlight >}}
to your script with a reasonable amount of memory and try running it again. If you keep
getting this error, continue to increase the requested memory amount and
resubmit the job until it finishes successfully.
For additional details on how to monitor usage on jobs, check out the
documentation on [Monitoring Jobs]({{< relref "monitoring_jobs" >}}).
If you continue to run into issues, please contact us at
{{< icon name="envelope" >}}[hcc-support@unl.edu](mailto:hcc-support@unl.edu)
for additional assistance.
#### I keep getting the error "Job cancelled due to time limit." What does this mean and how do I fix it?
This error occurs when the job you are running reached the time limit than was
requested in your submit script without finishing successfully.
If you specified `--time` in your submit script, try
increasing this value and resubmitting your job.
If you did not specify `--time` in your submit script,
chances are the default runtime of 1 hour is not sufficient. Add the line
{{< highlight batch >}}
#SBATCH --time=<runtime>
{{< /highlight >}}
to your script with increased runtime value and try running it again. The maximum runtime on Swan
is 7 days (168 hours).
For additional details on how to monitor usage on jobs, check out the
documentation on [Monitoring Jobs]({{< relref "monitoring_jobs" >}}).
If you continue to run into issues, please contact us at
{{< icon name="envelope" >}}[hcc-support@unl.edu](mailto:hcc-support@unl.edu)
for additional assistance.
#### I want to talk to a human about my problem. Can I do that?
Of course! We have an open door policy and invite you to ~~stop by
......
......@@ -80,6 +80,17 @@ the memory and time requirements appropriately.
tools such as [Allinea Performance Reports]({{< relref "/applications/app_specific/allinea_profiling_and_debugging/allinea_performance_reports" >}})
and [mem_report]({{< relref "monitoring_jobs" >}}). While these tools can not predict the needed resources, they can provide
useful information the researcher can use the next time that particular application is run.
* **Before you request GPU in your submit script, make sure that the application and code you are
using supports that.** Only code that is written to support GPU can take advantage of GPU nodes. Please read the documentation of
the application and code you are using to see if GPU can be used. Misusing this information may harm the researcher's waiting
time in queue and result in underused resources. It is very important to request GPUs only when your code and application can
efficiently utilize them.
* **Before you request multiple GPUs in your submit script, make sure that the application and code you are
using supports that.** Only code that is written to support multiple GPUs can take advantage of multiple GPUs. Please read the
documentation of the application and code you are using to see if multiple GPUs can be used.
Misusing this information may harm the researcher's waiting time in queue and result in underused resources.
It is very important to request multiple GPUs only when your code and application can efficiently utilize them.
We strongly recommend you to read and follow this guidance. If you have any concerns about your workflows or need any
assistance, please contact HCC Support at {{< icon name="envelope" >}}[hcc-support@unl.edu](mailto:hcc-support@unl.edu).
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment