preventing_file_loss.md 7.12 KB
Newer Older
eharstad's avatar
eharstad committed
1
2
3
+++
title = "Preventing File Loss"
description = "How to prevent file loss on HCC clusters"
eharstad's avatar
eharstad committed
4
weight = 40
eharstad's avatar
eharstad committed
5
+++
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Each research group is allocated 50TB of storage in `/work` on HCC
clusters. With over 400 active groups, HCC does not have the resources
to provide regular backups of `/work` without sacrificing the
performance of the existing filesystem. No matter how careful a user
might be, there is always the risk of file loss due to user error,
natural disasters, or equipment failure.  
  
However, there are a number of solutions available for backing up your
data. By carefully considering the benefits and limitations of each,
users can select the backup methods that work best for their particular
needs. For truly robust file backups, we recommend combining multiple
methods. For example, use Git regularly along with manual backups to an
external hard-drive at regular intervals such as monthly or biannually.

eharstad's avatar
eharstad committed
21
---
22
23
24
25
26
27
28
### 1. Use your local machine:

If you have sufficient hard drive space, regularly backup your `/work`
directories to your personal computer. To avoid filling up your personal
hard-drives, consider using an external drive that can easily be placed
in a fireproof safe or at an off-site location for an extra level of
protection. To do this, you can either use [Globus
eharstad's avatar
eharstad committed
29
Connect]({{< relref "/Data_Transfer/globus_connect" >}}) or an
30
31
32
SCP client, such
as <a href="https://cyberduck.io/" class="external-link">Cyberduck</a> or <a href="https://winscp.net/eng/index.php" class="external-link">WinSCP</a>.
For help setting up an SCP client, check out our [Quick Start
eharstad's avatar
eharstad committed
33
Guides]({{< relref "/quickstarts" >}}).
34
35
36
37
38
39
40
41
  
For those worried about personal hard drive crashes, UNL
offers <a href="http://nsave.unl.edu/" class="external-link">the backup service NSave</a>.
For a small monthly fee, users can install software that will
automatically backup selected files from their personal machine.  
  
Benefits:

eharstad's avatar
eharstad committed
42
-   Gives you full control over what is backed up and when.
43
44
45
46
47
-   Doesn't require the use of third party servers (when using SCP
    clients).
-   Take advantage of our high speed data transfers (10 Gb/s) when using
    Globus Connect or [setup your SCP client to use our dedicated high
    speed transfer
eharstad's avatar
eharstad committed
48
    servers]({{< relref "/Data_Transfer/high_speed_data_transfers" >}})
49
50
51
52
53
54

Limitations:

-   The amount you can backup is limited by available hard-drive space.
-   Manual backups of many files can be time consuming.

eharstad's avatar
eharstad committed
55
---
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
### 2. Use Git to preserve files and revision history:

Git is a revision control service which can be run locally or can be
paired with a repository hosting service, such
as <a href="http://www.github.com/" class="external-link">GitHub</a>, to
provide a remote backup of your files. Git works best with smaller files
such as source code and manuscripts. Anyone with an InCommon login can
utilize <a href="http://git.unl.edu/" class="external-link">UNL's GitLab Instance</a>,
for free.  
  
Benefits:

-   Git is naturally collaboration-friendly, allowing multiple people to
    easily work on the same project and provides great built-in tools to
    control contributions and managing conflicting changes.
-   Create individual repositories for each project, allowing you to
    compartmentalize your work.
-   Using UNL's GitLab instance allows you to create private or internal
    (accessible by anyone within your organization) repositories.

Limitations:

-   Git is not designed to handle large files. GitHub does not allow
    files larger than 100MB unless using
    their <a href="https://help.github.com/articles/about-git-large-file-storage/" class="external-link">Git Large File Storage</a> and
    tracking files over 1GB in size can be time consuming and lead to
    errors when using other repository hosts.

eharstad's avatar
eharstad committed
84
---
85
86
87
88
89
### 3. Use Attic:

HCC offers
long-term, <a href="https://en.wikipedia.org/wiki/Nearline_storage" class="external-link">near-line</a> data
storage
eharstad's avatar
eharstad committed
90
through [Attic]({{< relref "using_attic" >}}).
91
92
93
94
95
96
97
98
99
100
101
102
HCC users with an existing account
can <a href="http://hcc.unl.edu/attic" class="external-link">apply for an Attic account</a> for
<a href="http://hcc.unl.edu/priority-access-pricing" class="external-link">small annual fee</a> that
is substantially less than other cloud services.  
  
Benefits:

-   Attic files are backed up regularly at both HCC locations in Omaha
    and Lincoln to help provide disaster tolerance and a second security
    layer against file loss.
-   No limits on individual or total file sizes.
-   High speed data transfers between Attic and the clusters when using
eharstad's avatar
eharstad committed
103
104
    [Globus Connect]({{< relref "/Data_Transfer/globus_connect" >}}) and [HCC's high-speed data
    servers]({{< relref "/Data_Transfer/high_speed_data_transfers" >}}).
105
106
107
108
109
110

Limitations:

-   Backups must be done manually which can be time consuming. Setting
    up automated scripts can help speed up this process.

eharstad's avatar
eharstad committed
111
---
112
113
114
115
116
117
118
### 4. Use a cloud-based service, such as Box:

Many of us are familiar with services such as Google Drive, Dropbox, Box
and OneDrive. These cloud-based services provide a convenient portal for
accessing your files from any computer. NU offers OneDrive and Box
services to all students, staff and faculty. But did you know that you
can link your Box account to HCC’s clusters to provide quick and easy
eharstad's avatar
eharstad committed
119
120
access to files stored there?  [Follow a few set-up
steps]({{< relref "integrating_box_with_hcc" >}}) and
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
you can add files to and access files stored in your Box account
directly from HCC clusters. Setup your submit scripts to automatically
upload results as they are generated or use it interactively to store
important workflow scripts and maintain a backup of your analysis
results.  
  
Benefits:

-   <a href="http://box.unl.edu/" class="external-link">Box@UNL</a> offers
    unlimited file storage while you are associated with UNL.
-   Integrating with HCC clusters provides a quick and easy way to
    automate backups of analysis results and workflow scripts.

Limitations:

-   Box has individual file size limitations, larger files will need to
    be backed up using an alternate method.

eharstad's avatar
eharstad committed
139
140
---
### 5. Copy important files to `/home`:
141
142
143
144
145
146
147
148
149
150
151

While `/work` files and directories are not backed up, files and
directories in `/home` are backed up on a daily basis. Due to the
limitations of the `/home` filesystem, we strongly recommend that only
source code and compiled programs are backed up to `/home`. If you do
use `/home` to backup datasets, please keep a working copy in your
`/work` directories to prevent negatively impacting the functionality of
the cluster.  
  
Benefits:

eharstad's avatar
eharstad committed
152
-   No need to make manual backups. `\home` files are automatically backed
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
    up daily.
-   Files in `/home` are not subject to the 6 month purge policy that
    exists on `/work`.
-   Doesn't require the use of third-party software or tools.

Limitations:

-   Home storage is limited to 20GB per user. Larger files sets will
    need to be backed up using an alternate method.
-   Home is read-only on the cluster worker nodes so results cannot be
    directly written or altered from within a submitted job.

  
If you would like more information or assistance in setting up any of
these methods, contact us
at <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a>