_index.md 5.45 KB
Newer Older
1
2
3
4
5
6
+++
title = "Handling Data"
description = "How to work with and transfer data to/from HCC resources."
weight = "30"
+++

eharstad's avatar
eharstad committed
7
{{% panel theme="danger" header="**Sensitive and Protected Data**" %}}HCC currently has *no storage* that is suitable for **HIPAA** or other **PID** data sets.  Users are not permitted to store such data on HCC machines.{{% /panel %}}
8
9
10

All HCC machines have three separate areas for every user to store data,
each intended for a different purpose.   In addition, we have a transfer
eharstad's avatar
eharstad committed
11
12
service that utilizes [Globus Connect]({{< relref "globus_connect" >}}).
{{< figure src="/images/35325560.png" >}}
13

eharstad's avatar
eharstad committed
14
15
---
### Home Directory
16

eharstad's avatar
eharstad committed
17
{{% notice info %}}
18
19
You can access your home directory quickly using the $HOME environmental
variable (i.e. '`cd $HOME'`).
eharstad's avatar
eharstad committed
20
{{% /notice %}}
21
22
23
24
25
26
27
28
29

Your home directory (i.e. `/home/[group]/[username]`) is meant for items
that take up relatively small amounts of space.  For example:  source
code, program binaries, configuration files, etc.  This space is
quota-limited to **20GB per user**.  The home directories are backed up
for the purposes of best-effort disaster recovery.  This space is not
intended as an area for I/O to active jobs.  **/home** is mounted
**read-only** on cluster worker nodes to enforce this policy.

eharstad's avatar
eharstad committed
30
31
---
### Common Directory
32

eharstad's avatar
eharstad committed
33
{{% notice info %}}
34
35
You can access your common directory quickly using the $COMMON
environmental variable (i.e. '`cd $COMMON`')
eharstad's avatar
eharstad committed
36
{{% /notice %}}
37
38
39

The common directory operates similarly to work and is mounted with
**read and write capability to worker nodes all HCC Clusters**. This
eharstad's avatar
eharstad committed
40
means that any files stored in common can be accessed from Crane and Tusker, making this directory ideal for items that need to be
41
42
43
accessed from multiple clusters such as reference databases and shared
data files.

eharstad's avatar
eharstad committed
44
{{% notice warning %}}
45
46
47
Common is not designed for heavy I/O usage. Please continue to use your
work directory for active job output to ensure the best performance of
your jobs.
eharstad's avatar
eharstad committed
48
{{% /notice %}}
49
50
51
52
53
54
55

Quotas for common are **30 TB per group**, with larger quotas available
for purchase if needed. However, files stored here will **not be backed
up** and are **not subject to purge** at this time. Please continue to
backup your files to prevent irreparable data loss.

Additional information on using the common directories can be found in
eharstad's avatar
eharstad committed
56
the documentation on [Using the /common File System]({{< relref "using_the_common_file_system" >}})
57

eharstad's avatar
eharstad committed
58
59
---
### High Performance Work Directory
60

eharstad's avatar
eharstad committed
61
{{% notice info %}}
62
63
You can access your work directory quickly using the $WORK environmental
variable (i.e. '`cd $WORK'`).
eharstad's avatar
eharstad committed
64
{{% /notice %}}
65

eharstad's avatar
eharstad committed
66
{{% panel theme="danger" header="**File Loss**" %}}The `/work` directories are **not backed up**. Irreparable data loss is possible with a mis-typed command. See [Preventing File Loss]({{< relref "preventing_file_loss" >}}) for strategies to avoid this.{{% /panel %}}
67
68
69
70
71
72
73

Every user has a corresponding directory under /work using the same
naming convention as `/home` (i.e. `/work/[group]/[username]`).  We
encourage all users to use this space for I/O to running jobs.  This
directory can also be used when larger amounts of space are temporarily
needed.  There is a **50TB per group quota**; space in /work is shared
among all users.  It should be treated as short-term scratch space, and
eharstad's avatar
eharstad committed
74
**is not backed up****Please use the `hcc-du` command to check your
75
own and your group's usage, and back up and clean up your files at
eharstad's avatar
eharstad committed
76
reasonable intervals in $WORK.**
77

eharstad's avatar
eharstad committed
78
---
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
### Purge Policy

HCC has a **purge policy on /work** for files that become dormant.
 After **6 months of inactivity on a file (26 weeks)**, an automated
purge process will reclaim the used space of these dormant files.  HCC
provides the **`hcc-purge`** utility to list both the summary and the
actual file paths of files that have been dormant for **24 weeks**.
 This list is periodically generated; the timestamp of the last search
is included in the default summary output when calling `hcc-purge` with
no arguments.  No output from `hcc-purge` indicates the last scan did
not find any dormant files.  `hcc-purge -l` will use the less pager to
list the matching files for the user.  The candidate list can also be
accessed at the following path:` /lustre/purge/current/${USER}.list`.
 This list is updated twice a week, on Mondays and Thursdays.

eharstad's avatar
eharstad committed
94
95
96
{{% notice warning %}}
`/work` is intended for recent job output and not long term storage. Evidence of circumventing the purge policy by users will result in consequences including account lockout.
{{% /notice %}}
97
98
99
100
101
102

If you have space requirements outside what is currently provided,
please
email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a> and
we will gladly discuss alternatives.

eharstad's avatar
eharstad committed
103
104
---
### [Attic]({{< relref "using_attic" >}})
105
106
107
108

Attic is a near line archive available for purchase at HCC.  Attic
provides reliable large data storage that is designed to be more
reliable then `/work`, and larger than `/home`. Access to Attic is done
eharstad's avatar
eharstad committed
109
through [Globus Connect]({{< relref "globus_connect" >}}).
110
111
112
113
114

More details on Attic can be found on HCC's
<a href="https://hcc.unl.edu/attic" class="external-link">Attic</a>
website.

eharstad's avatar
eharstad committed
115
116
---
### [Globus Connect]({{< relref "globus_connect" >}})
117
118
119

For moving large amounts of data into or out of HCC resources, users are
highly encouraged to consider using [Globus
eharstad's avatar
eharstad committed
120
Connect]({{< relref "globus_connect" >}}).
121

eharstad's avatar
eharstad committed
122
123
---
### Using Box
124
125

You can use your [UNL
eharstad's avatar
eharstad committed
126
Box.com]({{< relref "integrating_box_with_hcc" >}}) account to download and
127
128
129
upload files from any of the HCC clusters.