Commit b3035765 authored by Caughlin Bohn's avatar Caughlin Bohn

Made overview of data handling

parent dbc09199
......@@ -7,124 +7,23 @@ weight = "30"
{{% panel theme="danger" header="**Sensitive and Protected Data**" %}}HCC currently has *no storage* that is suitable for **HIPAA** or other **PID** data sets. Users are not permitted to store such data on HCC machines.{{% /panel %}}
All HCC machines have three separate areas for every user to store data,
each intended for a different purpose. In addition, we have a transfer
service that utilizes [Globus Connect]({{< relref "data_transfer/globus_connect/" >}}).
each intended for a different purpose. The three areas are `/common`, `/work`, and `/home`, each with different functions. `/home` is your home directory with a quota limit of **20GB** and is backed up for best-effort disaster recovery purposes. `/work` is the high performance, I/O focused directory for running jobs. `/work` has a **50TB per group quote**, is not backed-up and is subject to a [purge policy]({{<relref "data_storage/#purge-policy" >}}) of **6 months of inactivity on a file**. `/common` works similarly to `/work` and is mounted with read and write capabilities on all HCC clusters, meaning any files on `/common` can be accessed from all of HCC clusters unlike `/home` and `/work` which are cluster dependant. More information on the three storage areas on HCC's clusters are available in the [Data Storage]({{<relref "data_storage">}}) page.
{{< figure src="/images/35325560.png" height="500" class="img-border">}}
---
### Home Directory
HCC also offers a separate, near-line archive with space available for purchase called Attic. Attic provides reliable large data storage that is designed to be more reliable than `/work`, and larger than `/home`. More information on Attic and how to transfer data to and from Attic can be found on the [Using Attic]({{<relref "data_storage/using_attic">}}) page.
{{% notice info %}}
You can access your home directory quickly using the $HOME environmental
variable (i.e. '`cd $HOME'`).
{{% /notice %}}
Your home directory (i.e. `/home/[group]/[username]`) is meant for items
that take up relatively small amounts of space. For example: source
code, program binaries, configuration files, etc. This space is
quota-limited to **20GB per user**. The home directories are backed up
for the purposes of best-effort disaster recovery. This space is not
intended as an area for I/O to active jobs. **/home** is mounted
**read-only** on cluster worker nodes to enforce this policy.
---
### Common Directory
{{% notice info %}}
You can access your common directory quickly using the $COMMON
environmental variable (i.e. '`cd $COMMON`')
{{% /notice %}}
The common directory operates similarly to work and is mounted with
**read and write capability to worker nodes all HCC Clusters**. This
means that any files stored in common can be accessed from Crane or Rhino,
making this directory ideal for items that need to be
accessed from multiple clusters such as reference databases and shared
data files.
{{% notice warning %}}
Common is not designed for heavy I/O usage. Please continue to use your
work directory for active job output to ensure the best performance of
your jobs.
{{% /notice %}}
Quotas for common are **30 TB per group**, with larger quotas available
for purchase if needed. However, files stored here will **not be backed
up** and are **not subject to purge** at this time. Please continue to
backup your files to prevent irreparable data loss.
Additional information on using the common directories can be found in
the documentation on [Using the /common File System]({{< relref "using_the_common_file_system" >}})
---
### High Performance Work Directory
{{% notice info %}}
You can access your work directory quickly using the $WORK environmental
variable (i.e. '`cd $WORK'`).
{{% /notice %}}
{{% panel theme="danger" header="**File Loss**" %}}The `/work` directories are **not backed up**. Irreparable data loss is possible with a mis-typed command. See [Preventing File Loss]({{< relref "preventing_file_loss" >}}) for strategies to avoid this.{{% /panel %}}
Every user has a corresponding directory under /work using the same
naming convention as `/home` (i.e. `/work/[group]/[username]`). We
encourage all users to use this space for I/O to running jobs. This
directory can also be used when larger amounts of space are temporarily
needed. There is a **50TB per group quota**; space in /work is shared
among all users. It should be treated as short-term scratch space, and
**is not backed up**. **Please use the `hcc-du` command to check your
own and your group's usage, and back up and clean up your files at
reasonable intervals in $WORK.**
You can also use your [UNL Box.com]({{< relref "integrating_box_with_hcc" >}}) account to download and
upload files from any of the HCC clusters.
---
### Purge Policy
For moving general data into or out of HCC Resources, users are recommended to use [scp]({{<relref "data_transfer/scp" >}}) for command line transfers on Windows 10, MacOS, and Linux, or for graphical transfers, [WinSCP]({{<relref "data_transfer/winscp" >}}) for Windows, and [CyberDuck]({{<relref "data_transfer/cyberduck" >}}) for MacOS and Linux
HCC has a **purge policy on /work** for files that become dormant.
After **6 months of inactivity on a file (26 weeks)**, an automated
purge process will reclaim the used space of these dormant files. HCC
provides the **`hcc-purge`** utility to list both the summary and the
actual file paths of files that have been dormant for **24 weeks**.
This list is periodically generated; the timestamp of the last search
is included in the default summary output when calling `hcc-purge` with
no arguments. No output from `hcc-purge` indicates the last scan did
not find any dormant files. `hcc-purge -l` will use the less pager to
list the matching files for the user. The candidate list can also be
accessed at the following path:` /lustre/purge/current/${USER}.list`.
This list is updated twice a week, on Mondays and Thursdays.
For moving large amounts of data into or out of HCC resources, users are highly encouraged to consider using [Globus Connect]({{< relref "data_transfer/globus_connect/" >}}).
{{% notice warning %}}
`/work` is intended for recent job output and not long term storage. Evidence of circumventing the purge policy by users will result in consequences including account lockout.
{{% /notice %}}
If you have space requirements outside what is currently provided,
If you have space requirements outside what is currently provided or any questions regarding moving data around,
please
email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a> and
we will gladly discuss alternatives.
---
### [Attic]({{< relref "using_attic" >}})
Attic is a near line archive available for purchase at HCC. Attic
provides reliable large data storage that is designed to be more
reliable then `/work`, and larger than `/home`. Access to Attic is done
through [Globus Connect]({{< relref "data_transfer/globus_connect/" >}}).
email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a>.
More details on Attic can be found on HCC's
<a href="https://hcc.unl.edu/attic" class="external-link">Attic</a>
website.
---
### [Globus Connect]({{< relref "data_transfer/globus_connect/" >}})
For moving large amounts of data into or out of HCC resources, users are
highly encouraged to consider using [Globus
Connect]({{< relref "data_transfer/globus_connect/" >}}).
---
### Using Box
You can use your [UNL
Box.com]({{< relref "integrating_box_with_hcc" >}}) account to download and
upload files from any of the HCC clusters.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment