Commit a06ce44e authored by eharstad's avatar eharstad
Browse files

Update handling data articles

parent 51eeee2c
......@@ -4,33 +4,20 @@ description = "How to work with and transfer data to/from HCC resources."
weight = "30"
+++
<span id="title-text"> HCC-DOCS : Handling Data </span>
=======================================================
Created by <span class="author"> Derek Weitzel</span>, last modified by
<span class="editor"> Carrie Brown</span> on Sep 18, 2018
<span
class="aui-icon aui-icon-small aui-iconfont-warning confluence-information-macro-icon"></span>
HCC currently has no storage that is suitable for HIPAA or other PID
data sets.  Users are not permitted to store such data on HCC machines.
{{% panel theme="danger" header="**Sensitive and Protected Data**" %}}HCC currently has *no storage* that is suitable for **HIPAA** or other **PID** data sets. Users are not permitted to store such data on HCC machines.{{% /panel %}}
All HCC machines have three separate areas for every user to store data,
each intended for a different purpose.   In addition, we have a transfer
service that utilizes [Globus Connect](Globus-Connect_6357013.html).
<span
class="confluence-embedded-file-wrapper image-center-wrapper confluence-embedded-manual-size"><img src="assets/images/332256/35325560.png" class="confluence-embedded-image image-center" width="1000" /></span>
service that utilizes [Globus Connect]({{< relref "globus_connect" >}}).
{{< figure src="/images/35325560.png" >}}
Home Directory
--------------
<span
class="aui-icon aui-icon-small aui-iconfont-info confluence-information-macro-icon"></span>
---
### Home Directory
{{% notice info %}}
You can access your home directory quickly using the $HOME environmental
variable (i.e. '`cd $HOME'`).
{{% /notice %}}
Your home directory (i.e. `/home/[group]/[username]`) is meant for items
that take up relatively small amounts of space.  For example:  source
......@@ -40,28 +27,25 @@ for the purposes of best-effort disaster recovery.  This space is not
intended as an area for I/O to active jobs.  **/home** is mounted
**read-only** on cluster worker nodes to enforce this policy.
Common Directory
----------------
<span
class="aui-icon aui-icon-small aui-iconfont-info confluence-information-macro-icon"></span>
---
### Common Directory
{{% notice info %}}
You can access your common directory quickly using the $COMMON
environmental variable (i.e. '`cd $COMMON`')
{{% /notice %}}
The common directory operates similarly to work and is mounted with
**read and write capability to worker nodes all HCC Clusters**. This
means that any files stored in common can be accessed from Crane, Tusker
and Sandhills making this directory ideal for items that need to be
means that any files stored in common can be accessed from Crane and Tusker, making this directory ideal for items that need to be
accessed from multiple clusters such as reference databases and shared
data files.
<span
class="aui-icon aui-icon-small aui-iconfont-warning confluence-information-macro-icon"></span>
{{% notice warning %}}
Common is not designed for heavy I/O usage. Please continue to use your
work directory for active job output to ensure the best performance of
your jobs.
{{% /notice %}}
Quotas for common are **30 TB per group**, with larger quotas available
for purchase if needed. However, files stored here will **not be backed
......@@ -69,23 +53,17 @@ up** and are **not subject to purge** at this time. Please continue to
backup your files to prevent irreparable data loss.
Additional information on using the common directories can be found in
the documentation on [Using the /common File System](30444241.html)
the documentation on [Using the /common File System]({{< relref "using_the_common_file_system" >}})
High Performance Work Directory
-------------------------------
<span
class="aui-icon aui-icon-small aui-iconfont-info confluence-information-macro-icon"></span>
---
### High Performance Work Directory
{{% notice info %}}
You can access your work directory quickly using the $WORK environmental
variable (i.e. '`cd $WORK'`).
{{% /notice %}}
<span
class="aui-icon aui-icon-small aui-iconfont-error confluence-information-macro-icon"></span>
The `/work` directories are **not backed up**. Irreparable data loss is
possible with a mis-typed command. See [Preventing File
Loss](Preventing-File-Loss_29065313.html) for strategies to avoid this.
{{% panel theme="danger" header="**File Loss**" %}}The `/work` directories are **not backed up**. Irreparable data loss is possible with a mis-typed command. See [Preventing File Loss]({{< relref "preventing_file_loss" >}}) for strategies to avoid this.{{% /panel %}}
Every user has a corresponding directory under /work using the same
naming convention as `/home` (i.e. `/work/[group]/[username]`).  We
......@@ -93,11 +71,11 @@ encourage all users to use this space for I/O to running jobs.  This
directory can also be used when larger amounts of space are temporarily
needed.  There is a **50TB per group quota**; space in /work is shared
among all users.  It should be treated as short-term scratch space, and
**is not backed up**<span style="color: rgb(255,0,0);"><span
style="color: rgb(0,0,0);">Please use the `hcc-du` command to check your
**is not backed up****Please use the `hcc-du` command to check your
own and your group's usage, and back up and clean up your files at
reasonable intervals in $WORK.</span></span>
reasonable intervals in $WORK.**
---
### Purge Policy
HCC has a **purge policy on /work** for files that become dormant.
......@@ -113,58 +91,39 @@ list the matching files for the user.  The candidate list can also be
accessed at the following path:` /lustre/purge/current/${USER}.list`.
 This list is updated twice a week, on Mondays and Thursdays.
<span
class="aui-icon aui-icon-small aui-iconfont-error confluence-information-macro-icon"></span>
/work is intended for recent job output and not long term storage.
Evidence of circumventing the purge policy by users will result in
consequences including account lockout.
 
{{% notice warning %}}
`/work` is intended for recent job output and not long term storage. Evidence of circumventing the purge policy by users will result in consequences including account lockout.
{{% /notice %}}
If you have space requirements outside what is currently provided,
please
email <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a> and
we will gladly discuss alternatives.
[Attic](Using-Attic_11635580.html)
----------------------------------
---
### [Attic]({{< relref "using_attic" >}})
Attic is a near line archive available for purchase at HCC.  Attic
provides reliable large data storage that is designed to be more
reliable then `/work`, and larger than `/home`. Access to Attic is done
through [Globus Connect](Globus-Connect_6357013.html).
through [Globus Connect]({{< relref "globus_connect" >}}).
More details on Attic can be found on HCC's
<a href="https://hcc.unl.edu/attic" class="external-link">Attic</a>
website.
<span style="color: rgb(0,0,0);line-height: 1.4285715;font-size: 20.0px;">[Globus Connect](Globus-Connect_6357013.html)</span>
------------------------------------------------------------------------------------------------------------------------------
---
### [Globus Connect]({{< relref "globus_connect" >}})
For moving large amounts of data into or out of HCC resources, users are
highly encouraged to consider using [Globus
Connect](Globus-Connect_6357013.html).
Connect]({{< relref "globus_connect" >}}).
Using Box
---------
---
### Using Box
You can use your [UNL
Box.com](Integrating-Box-with-HCC_8192521.html) account to download and
Box.com]({{< relref "integrating_box_with_hcc" >}}) account to download and
upload files from any of the HCC clusters.
 
Attachments:
------------
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[HCCStorageOptions\_cb\_edits.pdf](attachments/332256/30444364.pdf)
(application/pdf)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[HCCStorageOptions\_cb\_edits.png](attachments/332256/30444365.png)
(image/png)
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" />
[StorageOptions.png](attachments/332256/35325560.png) (image/png)
1. [HCC-DOCS](index.html)
2. [HCC-DOCS Home](HCC-DOCS-Home_327685.html)
3. [HCC Documentation](HCC-Documentation_332651.html)
4. [Handling Data](Handling-Data_332256.html)
+++
title = "Data for UNMC Users Only"
description= "Data storage options for UNMC users"
weight = 50
+++
<span id="title-text"> HCC-DOCS : Data for UNMC users only </span>
==================================================================
Created by <span class="author"> Mako Furukawa Furukawa</span>, last
modified on Apr 07, 2014
<span
class="aui-icon aui-icon-small aui-iconfont-warning confluence-information-macro-icon"></span>
HCC currently has no storage that is suitable for HIPAA or other PID
{{% panel theme="danger" header="Sensitive and Protected Data" %}} HCC currently has no storage that is suitable for HIPAA or other PID
data sets.  Users are not permitted to store such data on HCC machines.
Tusker and Crane have a special directory, only for UNMC users. Please
note that this filesystem is still not suitable for HIPAA or other PID
data sets.
{{% /panel %}}
Transferring files to this machine from UNMC.
---------------------------------------------
---
### Transferring files to this machine from UNMC.
You will need to email us
at <a href="mailto:hcc-support@unl.edu" class="external-link">hcc-support@unl.edu</a> to
......@@ -28,8 +20,8 @@ gain access to this machine. Once you do, you can sftp to 10.14.250.1
and upload your files. Note that sftp is your only option. You may use
different sftp utilities depending on your platform you are logging in
from. Email us if you need help with this. Once you are logged in, you
should be at /volumes/UNMC1ZFS/\[group\]/\[username\], or
/home/\[group\]/\[username\]. Both are the same location and you will be
should be at `/volumes/UNMC1ZFS/[group]/[username]`, or
`/home/[group]/[username]`. Both are the same location and you will be
allowed to write files there.
For Windows, learn more about logging in and uploading files
......@@ -38,17 +30,14 @@ For Windows, learn more about logging in and uploading files
Using your uploaded files on Tusker or Crane.
---------------------------------------------
<span style="color: rgb(51,51,51);"><span
style="font-size: 14.0px;line-height: 1.4285715;">Using your
uploaded </span><span
style="font-size: 14.0px;line-height: 20.0px;">files</span><span
style="font-size: 14.0px;line-height: 1.4285715;"> is easy. Just go to
/shared/unmc1/\[group\]/\[username\] and your files will be in the same
Using your
uploaded files is easy. Just go to
`/shared/unmc1/[group]/[username]` and your files will be in the same
place. You may notice that the directory is not available at times. This
is because the unmc1 directory is automounted. This means, if you try to
go to the directory, it will show up. Just "cd" to
/shared/unmc1/\[group\]/\[username\] and all of the files will be
there.</span></span>
go to the directory, it will show up. Just "`cd`" to
`/shared/unmc1/[group]/[username]` and all of the files will be
there.
If you have space requirements outside what is currently provided,
please
......
1. [HCC-DOCS](index.html)
2. [HCC-DOCS Home](HCC-DOCS-Home_327685.html)
3. [HCC Documentation](HCC-Documentation_332651.html)
4. [Handling Data](Handling-Data_332256.html)
<span id="title-text"> HCC-DOCS : High-Speed Data Transfers </span>
===================================================================
Created by <span class="author"> Emelie Harstad</span>, last modified by
<span class="editor"> Josh Samuelson</span> on May 17, 2018
Tusker, Crane and Sandhills each have a dedicated transfer server with
10 Gb/s connectivity (Sandhills currently limited to 1 Gb/s) that allows
+++
title = "High Speed Data Transfers"
description = "How to transfer files directly from the transfer servers"
weight = 10
+++
Crane, Tusker, and Attic each have a dedicated transfer server with
10 Gb/s connectivity that allows
for faster data transfers than the login nodes.  With [Globus
Connect](https://hcc-docs.unl.edu/display/HCCDOC/Globus+Connect), users
Connect]({{< relref "globus_connect" >}}), users
can take advantage of this connection speed when making large/cumbersome
transfers.
<span style="line-height: 1.4285715;">
</span>
<span style="line-height: 1.4285715;">Those who prefer scp, sftp or
Those who prefer scp, sftp or
rsync clients can also benefit from this high-speed connectivity by
using these dedicated servers for data transfers:</span>
For Tusker transfers, use:
For Crane transfers, use:
using these dedicated servers for data transfers:
For Sandhills Transfers, use:
`tusker-xfer.unl.edu`
`crane-xfer.unl.edu`
`sandhills-xfer.unl.edu`
<span
class="aui-icon aui-icon-small aui-iconfont-warning confluence-information-macro-icon"></span>
Cluster | Transfer server
----------|----------------------
Crane | `crane-xfer.unl.edu`
Tusker | `tusker-xfer.unl.edu`
Attic | `attic-xfer.unl.edu`
{{% notice info %}}
Because the transfer servers are login-disabled, third-party transfers
between `tusker-xfer,` `crane-xfer` and `sandhills-xfer` must be done
via [Globus
Connect](https://hcc-docs.unl.edu/display/HCCDOC/Globus+Connect).
between `crane-xfer`, `tusker-xfer,` and `attic-xfer` must be done via [Globus Connect]({{< relref "globus_connect" >}}).
{{% /notice %}}
1. [HCC-DOCS](index.html)
2. [HCC-DOCS Home](HCC-DOCS-Home_327685.html)
3. [HCC Documentation](HCC-Documentation_332651.html)
4. [Handling Data](Handling-Data_332256.html)
<span id="title-text"> HCC-DOCS : Integrating Box with HCC </span>
==================================================================
Created by <span class="author"> Derek Weitzel</span>, last modified by
<span class="editor"> Adam Caprez</span> on Oct 11, 2018
+++
title = "Integrating Box with HCC"
description = "How to integrate Box with HCC"
weight = 30
+++
UNL has come to an arrangement
with <a href="https://www.box.com/" class="external-link">Box.com</a> to
......@@ -17,221 +12,81 @@ results when the job has completed.  Combined with
<a href="https://sites.box.com/sync4/" class="external-link">Box Sync</a>,
the uploaded files can be sync'd to your laptop or desktop upon job
completion. The upload and download speed of Box is about 20 to 30 MB/s
in good network traffic conditions.  There are two programs that can be
used to transfer files to/from Box - `cadaver` or `lftp`.  Instructions
are provided for both options
Step-by-step guide for Lftp
---------------------------
1.  Create an external password for Box as described in steps 1 and 2
in the Cadaver instructions below.
2.  Load the `lftp` module:
**Load the lftp module**
``` syntaxhighlighter-pre
module load lftp  
```
3. Connect to Box using your full email as the username and external
password you created:
**Connect to Box**
``` syntaxhighlighter-pre
lftp -u <username>,<password> ftps://ftp.box.com
```
4. Test the connection by running the `ls` command.  You should see a
listing of your Box files.  Assuming it works, add a bookmark named
"box" to use when connecting later:
**Add lftp bookmark**
``` syntaxhighlighter-pre
lftp demo2@unl.edu@ftp.box.com:/> bookmark add box
```
5. Exit `lftp` by typing `quit`.  To reconnect later, use bookmark
name:
**Connect using bookmark name**
``` syntaxhighlighter-pre
lftp box
```
6. To upload or download files, use the `get` and `put` commands.  For
example:
**Transferring files**
``` syntaxhighlighter-pre
[demo@login.crane ~]$ lftp box
lftp demo2@unl.edu@ftp.box.com:/> put myfile.txt
lftp demo2@unl.edu@ftp.box.com:/> get my_other_file.txt
```
7. To download directories, use the `mirror` command.  To upload
directories, use the `mirror` command with the `-R` option.  For
example, to download a directory named `my_box_dir` to your current
directory:
**Download a directory from Box**
``` syntaxhighlighter-pre
[demo@login.crane ~]$ lftp box
lftp demo2@unl.edu@ftp.box.com:/> mirror my_box_dir
```
To upload a directory named `my_hcc_dir` to Box, use `mirror` with
the `-R` option:
**Upload a directory to Box**
``` syntaxhighlighter-pre
[demo@login.crane ~]$ lftp box
lftp demo2@unl.edu@ftp.box.com:/> mirror -R my_hcc_dir
```
8. Lftp also supports using scripts to transfer files.  This can be
used to automatic downloading or uploading files during jobs.  <span
style="color: rgb(0,0,0);">For example, create a file called
"transfer.sh" with the following lines:</span>
**transfer.sh**
``` syntaxhighlighter-pre
open box
get some_input_file.tar.gz
put my_output_file.tar.gz
```
To run this script, do:
**Run transfer.sh**
``` syntaxhighlighter-pre
module load lftp
lftp -f transfer.sh
```
Step-by-step guide for Cadaver
------------------------------
1. You need to create your UNL
<a href="http://Box.com" class="external-link">Box.com</a> account
<a href="http://box.unl.edu/" class="external-link">here</a>.
2. Since we are going to be using
<a href="https://en.wikipedia.org/wiki/WebDAV" class="external-link">webdav</a> protocol
to access your
<a href="http://Box.com" class="external-link">Box.com</a> storage,
you need to create an **External Password**.  In the
<a href="http://Box.com" class="external-link">Box.com</a>
interface, you can create it
at **<a href="https://unl.app.box.com/settings" class="external-link">Account Settings</a>** &gt; **Create
External Password.
<span
class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img src="assets/images/8192521/8126683.png" class="confluence-embedded-image" width="747" height="185" /></span>**
3. Create a
<a href="http://www.mavetju.org/unix/netrc.php" class="external-link">.netrc</a>
file in order to automatically login to your box account without
typing the password.  The file needs to be in your home directory,
ie `~/.netrc`.  You can easily create this file using the nano text
editor by using the command:
``` syntaxhighlighter-pre
nano ~/.netrc
```
The file should contain the following lines:
``` syntaxhighlighter-pre
machine dav.box.com
login <box_username>@unl.edu
password <external_password>
```
Once you have typed or pasted these lines into the file, press
CTRL-X to exit. Follow the prompts to save the file as `.netrc`.
4. Be sure to have the correct permissions on the file.  You can change
the permissions with the command:
``` syntaxhighlighter-pre
$ chmod 600 ~/.netrc
```
5. Try out the webdav client by issuing the command:
``` syntaxhighlighter-pre
$ cadaver https://dav.box.com/dav
```
It should give you a prompt like:
``` syntaxhighlighter-pre
dav:/dav/>
```
Within this prompt, you can view files and navigate through the file
system using the usual Bash commands **cd** and **ls**. To download
files from Box, use the command:
``` syntaxhighlighter-pre
get <filename>
```
Or, alternately, to upload files to your Box, use:
``` syntaxhighlighter-pre
put <filename>
```
To exit the prompt, press **ctrl-d**
6. Within a submit script, you can upload and download files by using
commands such as:
``` syntaxhighlighter-pre
#!/bin/sh
#SBATCH ...
....
cat << EOF | cadaver https://dav.box.com/dav
get inputfile.txt
EOF
 
cat << EOF | cadaver https://dav.box.com/dav
put outputfile.txt
EOF
```
7. The files should automatically appear in your Box account, and be
sync'd to your computer if you have the
<a href="https://sites.box.com/sync4/" class="external-link">sync client</a>
installed.
Related articles
----------------
- <span class="icon aui-icon aui-icon-small aui-iconfont-page-default"
title="Page">Page:</span>
[Integrating Box with HCC](/display/HCCDOC/Integrating+Box+with+HCC)
- <span class="icon aui-icon aui-icon-small aui-iconfont-page-default"
title="Page">Page:</span>
[Handling Data](/display/HCCDOC/Handling+Data)
Attachments:
------------
<img src="assets/images/icons/bullet_blue.gif" width="8" height="8" /> [Screen
Shot 2014-08-14 at 4.55.18 PM.png](attachments/8192521/8126683.png)
(image/png)
in good network traffic conditions.  Users can use a tool called lftp to transfer files between HCC clusters and their Box accounts.
---
### Step-by-step guide for Lftp
1. You need to create your UNL [Box.com](https://www.box.com/) account [here](https://box.unl.edu/).
2. Since we are going to be using [webdav](https://en.wikipedia.org/wiki/WebDAV) protocol to access your [Box.com](https://www.box.com/) storage, you need to create an **External Password**. In the [Box.com](https://www.box.com/) interface, you can create it at **[Account Settings](https://unl.app.box.com/settings) > Create External Password.**
{{< figure src="/images/box_create_external_password.png" class="img-border" >}}
3. After logging into the cluster of your choice, load the `lftp` module by entering the command below at the prompt:
{{% panel theme="info" header="Load the lftp module" %}}
{{< highlight bash >}}
module load lftp
{{< /highlight >}}
{{% /panel %}}
4. Connect to Box using your full email as the username and external password you created:
{{% panel theme="info" header="Connect to Box" %}}
{{< highlight bash >}}
lftp -u <username>,<password> ftps://ftp.box.com
{{< /highlight >}}
{{% /panel %}}
5. Test the connection by running the `ls` command. You should see a listing of your Box files. Assuming it works, add a bookmark named "box" to use when connecting later:
{{% panel theme="info" header="Add lftp bookmark" %}}
{{< highlight bash >}}
lftp demo2@unl.edu@ftp.box.com:/> bookmark add box
{{< /highlight >}}
{{% /panel %}}
6. Exit `lftp` by typing `quit`. To reconnect later, use bookmark name:
{{% panel theme="info" header="Connect using bookmark name" %}}
{{< highlight bash >}}
lftp box
{{< /highlight >}}
{{% /panel %}}
7. To upload or download files, use `get` and `put` commands. For example:
{{% panel theme="info" header="Transferring files" %}}
{{< highlight bash >}}
[demo2@login.crane ~]$ lftp box
lftp demo2@unl.edu@ftp.box.com:/> put myfile.txt
lftp demo2@unl.edu@ftp.box.com:/> get my_other_file.txt
{{< /highlight >}}
{{% /panel %}}
8. To download directories, use the `mirror` command. To upload directories, use the `mirror` command with the `-R` option. For example, to download a directory named `my_box-dir` to your current directory:
{{% panel theme="info" header="Download a directory from Box" %}}
{{< highlight bash >}}
[demo2@login.crane ~]$ lftp box
lftp demo2@unl.edu@ftp.box.com:/> mirror my_box_dir
{{< /highlight >}}
{{% /panel %}}
To upload a directory named `my_hcc_dir` to Box, use `mirror` with the `-R` option: