Add scratch page

71d996fc · Natasha Pavlovikj · 134a2dc2 · 71d996fc · 71d996fc
Commit 71d996fc authored 1 year ago by Natasha Pavlovikj
--- a/content/handling_data/_index.md
+++ b/content/handling_data/_index.md
@@ -24,6 +24,5 @@ If you have space requirements outside what is currently provided or any questio
 please
 email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a>.

-
-
-
+### Using */scratch* storage space to improve running jobs:      
+[Using Scratch]({{<relref "using_scratch_space" >}}) 
--- a/content/handling_data/using_scratch_space.md
+++ b/content/handling_data/using_scratch_space.md
+++
+title = "Using Scratch"
+description =  "How to use /scratch storage space to improve running jobs"
+weight = "10"
+++
+
+
+## What is Scratch? 
+*Scratch* is temporary local storage on the compute/worker node where the job is running. 
+This is the fastest storage available to an active running job. 
+The *scratch* space is temporary and accessible only while the job is running, and it is discarded after the job finishes. 
+Therefore, any important data from the *scratch* space should be moved to a permanent location on the cluster (such as *$WORK|$HOME|$COMMON*). 
+The *scratch* space is not backed-up.
+
+## When to use Scratch? 
+Using *scratch* improves the performance and is ideal for jobs that: 
+- perform many rapid input/output operations
+- modify and interact with many files
+- create many or large temporary files
+
+{{% notice info %}}
+When a permanent location on the cluster (such as *$WORK|$HOME|$COMMON*) is used for the analyses from above, 
+various issues can occur that can affect the cluster and everyone using it at the moment.
+{{% /notice %}}
+
+## How to use Scratch? 
+*Scratch* is accessible on the compute node while the job is running and no additional permission or setup is needed for its access. 
+
+*Scratch* can be utilized efficiently by: 
+- copying all needed input data to the temporary *scratch* space at the beginning of a job to ensure fast reading
+- writing job output to *scratch* using the proper output arguments from the used program
+- copying needed output data/folder back to a permanent location on the cluster before the job finishes
+
+These modifications are done in the submit SLURM script. 
+To access the *scratch* storage, one can do that with using **/scratch**. 
+
+Below is an example SLURM submit script. 
+This script assumes that the input data is in the current directory (please change that line if different), 
+and the final output data is copied back to $WORK. *my_program -\-output* is used just as an example, 
+and it should be replaced with the program/application you use and its respective output arguments. 
+
+{{% panel header="`use_scratch_example.submit`"%}}
+{{< highlight bash >}}
+#!/bin/bash
+#SBATCH --job-name=example
+#SBATCH --nodes=1
+#SBATCH --ntasks-per-node=1
+#SBATCH --time=01:00:00
+#SBATCH --mem=5gb
+#SBATCH --output=example.%J.out
+#SBATCH --error=example.%J.err
+
+# load necessary modules
+
+# copy all needed input data to /scratch 
+cp -r input_data /scratch/
+
+# if needed, change current working directory, e.g., $WORK to /scratch
+# pushd /scratch
+
+# use your program of interest and write program output to /scratch 
+# using the proper output arguments from the used program, e.g.,
+my_program --output /scratch/output
+
+# return the batch script shell to where it was at when pushd was called
+# popd
+
+# copy needed output to $WORK 
+cp -r /scratch/output $WORK
+{{< /highlight >}}
+{{% /panel %}}
+
+{{% notice info %}}
+If your application requires for the input data to be in the current working directory (cwd) or the output to be stored in the current workng directory, then make sure you change the current working directory with **pushd /scratch** before you start running your application.
+{{% /notice %}}
+
+Additional examples of SLURM submit scripts that use **scratch** and are used on Swan are provided for 
+[BLAST](https://hcc.unl.edu/docs/applications/app_specific/bioinformatics_tools/alignment_tools/blast/running_blast_alignment/) 
+and [Trinity](https://hcc.unl.edu/docs/applications/app_specific/bioinformatics_tools/de_novo_assembly_tools/trinity/running_trinity_in_multiple_steps/).
+
+{{% notice note %}}
+Please note that after the job finishes (either successfully or fails), the data in *scratch* for that job will be permanently deleted.
+{{% /notice %}}
+
+## Disadvantages of Scratch 
+- limited storage capacity
+- shared with other jobs that are running on the same compute/worker node
+- job spanning across multiple compute nodes have its own unique *scratch* storage per compute node
+- data stored in *scratch* on one compute node can not be directly accessed by a different compute node and the processes that run there
+- temporary storage while the job is running
+- if the job fails, no output is saved and checkpointing can not be used
+
+{{% notice note %}}
+Using *scratch* is especially recommended for many Bioinformatics applications (such as BLAST, GATK, Trinity) 
+that perform many rapid input/output operations and can affect the file system on the cluster.
+{{% /notice %}}