Commit 897b8d29 authored by Adam Caprez's avatar Adam Caprez
Browse files

Merge branch 'rhino-updates' into 'master'

Initial update to docs for Rhino.

See merge request !136
parents be2f4ce6 79ab9f1c
......@@ -31,17 +31,27 @@ are new to using HCC resources, Crane is the recommended cluster to use
initially.  Limitations: Crane has only 2 CPU/16 cores and 64GB RAM per
node. CraneOPA has 2 CPU/36 cores with a maximum of 512GB RAM per node.
**Rhino**: Rhino is intended for large memory (RAM) computing needs.
Rhino has 4 AMD Interlagos CPUs (64 cores) per node, with either 192GB or 256GB RAM per
node in the default partition. For extremely large RAM needs, there is also
a 'highmem' partition with 2 x 512GB and 2 x 1TB nodes.
User Login
----------
For Windows users, please refer to this link [For Windows Users]({{< relref "for_windows_users" >}}).
For Mac or Linux users, please refer to this link [For Mac/Linux Users]({{< relref "for_maclinux_users">}}).
**Logging into Crane**
**Logging into Crane or Rhino**
{{< highlight bash >}}
ssh <username>@crane.unl.edu
{{< /highlight >}}
or
{{< highlight bash >}}
ssh <username>@rhino.unl.edu
{{< /highlight >}}
Duo Security
......@@ -52,6 +62,11 @@ resources. Registration and usage of Duo security can be found in this
section: [Setting up and using Duo]({{< relref "setting_up_and_using_duo">}})
**Important Notes**
- The Crane and Rhino clusters are separate. But, they are
similar enough that submission scripts on whichever one will work on
another, and vice versa (excluding GPU resources and some combinations of
RAM/core requests).
 
- The worker nodes cannot write to the `/home` directories. You must
use your `/work` directory for processing in your job. You may
......@@ -65,6 +80,8 @@ Resources
- ##### Crane - HCC's newest machine, Crane has 7232 Intel Xeon cores in 452 nodes with 64GB RAM per node.
- ##### Rhino - HCC's AMD-based cluster, intended for large RAM computing needs.
- ##### Red - This cluster is the resource for UNL's US CMS Tier-2 site.
- [CMS](http://www.uscms.org/)
......@@ -80,13 +97,14 @@ Resource Capabilities
| Cluster | Overview | Processors | RAM | Connection | Storage
| ------- | ---------| ---------- | --- | ---------- | ------
| **Crane** | 548 node Production-mode LINUX cluster | 452 Intel Xeon E5-2670 2.60GHz 2 CPU/16 cores per node<br> <br>116 Intel Xeon E5-2697 v4 2.3GHz, 2 CPU/36 cores per node<br><br>("CraneOPA") | 452 nodes @ \*64GB<br><br>79 nodes @ \*\*256GB<br><br>37 nodes @ \*\*\*512GB | QDR Infiniband<br><br>EDR Omni-Path Architecture | ~1.8 TB local scratch per node<br><br>~4 TB local scratch per node<br><br>~1452 TB shared Lustre storage
| **Crane** | 548 node Production-mode LINUX cluster | 452 Intel Xeon E5-2670 2.60GHz 2 CPU/16 cores per node<br> <br>116 Intel Xeon E5-2697 v4 2.3GHz, 2 CPU/36 cores per node<br><br>("CraneOPA") | 452 nodes @ \*64GB<br><br>79 nodes @ \*\*\*256GB<br><br>37 nodes @ \*\*\*\*512GB | QDR Infiniband<br><br>EDR Omni-Path Architecture | ~1.8 TB local scratch per node<br><br>~4 TB local scratch per node<br><br>~1452 TB shared Lustre storage
| **Rhino** | 110 node Production-mode LINUX cluster | 110 AMD Interlagos CPUs (6272 / 6376), 4 CPU/64 cores per node | 106 nodes @ 192GB\*\*/256GB\*\*\* <br><br> 2 nodes @ 512GB\*\*\*\* <br><br> 2 nodes @ 1024GB\*\*\*\*\* | QDR Infiniband | ~1.5TB local scratch per node <br><br> ~360TB shared BeeGFS storage |
| **Red** | 344 node Production-mode LINUX cluster | Various Xeon and Opteron processors 7,280 cores maximum, actual number of job slots depends on RAM usage | 1.5-4GB RAM per job slot | 1Gb, 10Gb, and 40Gb Ethernet | ~6.67PB of raw storage space |
| **Anvil** | 76 Compute nodes (Partially used for cloud, the rest used for general computing), 12 Storage nodes, 2 Network nodes Openstack cloud | 76 Intel Xeon E5-2650 v3 2.30GHz 2 CPU/20 cores per node | 76 nodes @ 256GB | 10Gb Ethernet | 528 TB Ceph shared storage (349TB available now) |
You may only request the following amount of RAM: <br>
\*62.5GB <br>
\*\*250GB <br>
\*\*\*500GB <br>
\*\*\*\*1000GB
\*\*187.5GB <br>
\*\*\*250GB <br>
\*\*\*\*500GB <br>
\*\*\*\*\*1000GB
......@@ -6,13 +6,11 @@ This document details the equipment resident in the Holland Computing Center (HC
HCC has two primary locations directly interconnected by a pair of 10 Gbps fiber optic links (20 Gbps total). The 1800 sq. ft. HCC machine room at the Peter Kiewit Institute (PKI) in Omaha can provide up to 500 kVA in UPS and genset protected power, and 160 ton cooling. A 2200 sq. ft. second machine room in the Schorr Center at the University of Nebraska-Lincoln (UNL) can currently provide up to 100 ton cooling with up to 400 kVA of power. One Brocade MLXe router and two Dell Z9264F-ON core switches in each location provide both high WAN bandwidth and Software Defined Networking (SDN) capability. The Schorr machine room connects to campus and Internet2/ESnet at 100 Gbps while the PKI machine room connects at 10 Gbps. HCC uses multiple data transfer nodes as well as a FIONA (flash IO network appliance) to facilitate end-to-end performance for data intensive workflows.
HCC's resources at UNL include two distinct offerings: Sandhills and Red. Sandhills is a linux cluster dedicated to general campus usage with 5,472 compute cores interconnected by low-latency InfiniBand networking. 175 TB of Lustre storage is complemented by 50 TB of NFS storage and 3 TB of local scratch per node. Tusker offers 3,712 cores interconnected with Mellanox QDR InfiniBand along with 523TB of Lustre storage. Each compute node is a Dell R815 server with at least 256 GB RAM and 4 Opteron 6272 (2.1 GHz) processors.
HCC's resources at UNL include two distinct offerings: Rhino and Red. Rhino is a linux cluster dedicated to general campus usage with 7,040 compute cores interconnected by low-latency Mellanox QDR InfiniBand networking. 360 TB of BeeGFS storage is complemented by 50 TB of NFS storage and 1.5 TB of local scratch per node. Each compute node is a Dell R815 server with at least 192 GB RAM and 4 Opteron 6272 / 6376 (2.1 / 2.3 GHz) processors.
The largest machine on the Lincoln campus is Red, with 9,536 job slots interconnected by a mixture of 1, 10, and 40 Gbps ethernet. More importantly, Red serves up over 6.6 PB of storage using the Hadoop Distributed File System (HDFS). Red is integrated with the Open Science Grid (OSG), and serves as a major site for storage and analysis in the international high energy physics project known as CMS (Compact Muon Solenoid).
Tusker and Sandhills are currently decommissioned. These resources will be combined into a new cluster called Rhino which will be available at a future date.
HCC's resources at PKI (Peter Kiewit Institute) in Omaha include Crane, Anvil, Attic, and Common storage.
HCC's resources at PKI (Peter Kiewit Institute) in Omaha include Crane, Anvil, Attic, and Common storage.
Crane debuted at 474 on the Top500 list with an HPL benchmark or 121.8 TeraFLOPS. Intel Xeon chips (8-core, 2.6 GHz) provide the processing with 4 GB RAM available per core and a total of 12,236 cores. The cluster shares 1.5 PetaBytes of Lustre storage and contains HCC's GPU resources. We have since expanded the existing cluster: 96 nodes with new Intel Xeon E5-2697 v4 chips and 100GB Intel Omni-Path interconnect were added to Crane. Moreover, Crane has 21 GPU nodes with 57 NVIDIA GPUs in total which enables the most state-of-art research, from drug discovery to deep learning.
......@@ -26,17 +24,16 @@ These resources are detailed further below.
# 1. HCC at UNL Resources
## 1.1 Sandhills
## 1.1 Rhino
* 56 4-socket Opteron 6376 (16-core, 2.3 GHz) with 192 GB RAM
* 42 4-socket Opteron 6128 (8-core, 2.0 GHz) with 128 GB RAM
* 2  4-socket Opteron 6168 (12-core, 1.9 GHz) with 256 GB RAM
* Intel QDR InfiniBand
* 107 4-socket Opteron 6172 / 6376 (16-core, 2.1 / 2.3 GHz) with 192 or 256 GB RAM
* 2x with 512 GB RAM, 2x with 1024 GB RAM
* Mellanox QDR InfiniBand
* 1 and 10 GbE networking
* 5x Dell N3048 switches
* 50 TB shared storage (NFS) -> /home
* 175TB shared scratch storage (Lustre) -> /work
* 3TB local scratch
* 50TB shared storage (NFS) -> /home
* 360TB BeeGFS storage over Infiniband -> /work
* 1.5TB local scratch
## 1.2 Red
......@@ -64,16 +61,6 @@ These resources are detailed further below.
* 1 Mercury RM216 2U Rackmount Server 2 Xeon E5-2630 (12-core, 2.6GHz)
* 10 Mercury RM445J 4U Rackmount JBOD with 45x 4TB NL SAS Hard Disks
## 1.4 Tusker
* 58 PowerEdge R815 systems
* 54x with 256 GB RAM, 2x with 512 GB RAM, 2x with 1024 GB RAM
* 4-socket Opteron 6272 Interlagos (64-core, 2.1GHz)
* Mellanox QDR InfiniBand
* 1 GbE networking
* 3x Dell Powerconnect 6248 switches
* 523TB Lustre storage over InfiniBand
# 2. HCC at PKI Resources
## 2.1 Crane
......
......@@ -37,7 +37,8 @@ environmental variable (i.e. '`cd $COMMON`')
The common directory operates similarly to work and is mounted with
**read and write capability to worker nodes all HCC Clusters**. This
means that any files stored in common can be accessed from Crane, making this directory ideal for items that need to be
means that any files stored in common can be accessed from Crane or Rhino,
making this directory ideal for items that need to be
accessed from multiple clusters such as reference databases and shared
data files.
......
......@@ -6,7 +6,7 @@ weight = 50
{{% panel theme="danger" header="Sensitive and Protected Data" %}} HCC currently has no storage that is suitable for HIPAA or other PID
data sets.  Users are not permitted to store such data on HCC machines.
Tusker and Crane have a special directory, only for UNMC users. Please
Crane have a special directory, only for UNMC users. Please
note that this filesystem is still not suitable for HIPAA or other PID
data sets.
{{% /panel %}}
......
......@@ -7,7 +7,7 @@ description = "Globus Connect overview"
a fast and robust file transfer service that allows users to quickly
move large amounts of data between computer clusters and even to and
from personal workstations.  This service has been made available for
Crane, and Attic. HCC users are encouraged to use Globus
Crane, Rhino, and Attic. HCC users are encouraged to use Globus
Connect for their larger data transfers as an alternative to slower and
more error-prone methods such as scp and winSCP. 
......@@ -15,7 +15,7 @@ more error-prone methods such as scp and winSCP. 
### Globus Connect Advantages
- Dedicated transfer servers on Crane, and Attic allow
- Dedicated transfer servers on Crane, Rhino, and Attic allow
large amounts of data to be transferred quickly between sites.
- A user can install Globus Connect Personal on his or her workstation
......
......@@ -4,15 +4,15 @@ description = "How to activate HCC endpoints on Globus"
weight = 20
+++
You will not be able to transfer files to or from an HCC endpoint using Globus Connect without first activating the endpoint.  Endpoints are available for Crane (`hcc#crane`), and Attic (`hcc#attic`).  Follow the instructions below to activate any of these endpoints and begin making transfers.
You will not be able to transfer files to or from an HCC endpoint using Globus Connect without first activating the endpoint.  Endpoints are available for Crane (`hcc#crane`), Rhino, (`hcc#rhino`), and Attic (`hcc#attic`).  Follow the instructions below to activate any of these endpoints and begin making transfers.
1. [Sign in](https://www.globus.org/SignIn) to your Globus account using your campus credentials or your Globus ID (if you have one). Then click on 'Endpoints' in the left sidebar.
{{< figure src="/images/Glogin.png" >}}
{{< figure src="/images/endpoints.png" >}}
2. Find the endpoint you want by entering '`hcc#crane`', or '`hcc#attic`' in the search box and hit 'enter'.  Once you have found and selected the endpoint, click the green 'activate' icon. On the following page, click 'continue'.
2. Find the endpoint you want by entering '`hcc#crane`', '`hcc#rhino`', or '`hcc#attic`' in the search box and hit 'enter'.  Once you have found and selected the endpoint, click the green 'activate' icon. On the following page, click 'continue'.
{{< figure src="/images/activateEndpoint.png" >}}
{{< figure src="/images/EndpointContinue.png" >}}
{{< figure src="/images/EndpointContinue.png" >}}
3. You will be redirected to the HCC Globus Endpoint Activation page.  Enter your *HCC* username and password (the password you usually use to log into the HCC clusters).
{{< figure src="/images/hccEndpoint.png" >}}
......
......@@ -5,7 +5,7 @@ weight = 50
+++
If you would like another colleague or researcher to have access to your
data, you may create a shared endpoint on Crane, or Attic. You can personally manage access to this endpoint and
data, you may create a shared endpoint on Crane, Rhino, or Attic. You can personally manage access to this endpoint and
give access to anybody with a Globus account (whether or not
they have an HCC account).  *Please use this feature responsibly by
sharing only what is necessary and granting access only to trusted
......
......@@ -7,7 +7,7 @@ weight = 30
To transfer files between HCC clusters, you will first need to
[activate]({{< relref "activating_hcc_cluster_endpoints" >}}) the
two endpoints you would like to use (the available endpoints
are: `hcc#crane` and `hcc#attic`).  Once
are: `hcc#crane` `hcc#rhino`, and `hcc#attic`).  Once
that has been completed, follow the steps below to begin transferring
files.  (Note: You can also transfer files between an HCC endpoint and
any other Globus endpoint for which you have authorized access.  That
......
......@@ -28,7 +28,7 @@ endpoints.
 From your Globus account, select the 'File Manager' tab
from the left sidebar and enter the name of your new endpoint the 'Collection' text box. Press 'Enter' and then
navigate to the appropriate directory. Select "Transfer of Sync to.." from the right sidebar (or select the "two panels"
icon from the top right corner) and Enter the second endpoint (for example: `hcc#crane`, or `hcc#attic`),
icon from the top right corner) and Enter the second endpoint (for example: `hcc#crane`, `hcc#rhino`, or `hcc#attic`),
type or navigate to the desired directory, and initiate the file transfer by clicking on the blue
arrow button.
{{< figure src="/images/PersonalTransfer.png" >}}
......
......@@ -4,7 +4,7 @@ description = "How to transfer files directly from the transfer servers"
weight = 10
+++
Crane and Attic each have a dedicated transfer server with
Crane, Rhino, and Attic each have a dedicated transfer server with
10 Gb/s connectivity that allows
for faster data transfers than the login nodes.  With [Globus
Connect]({{< relref "globus_connect" >}}), users
......@@ -18,6 +18,7 @@ using these dedicated servers for data transfers:
Cluster | Transfer server
----------|----------------------
Crane | `crane-xfer.unl.edu`
Rhino | `rhino-xfer.unl.edu`
Attic | `attic-xfer.unl.edu`
{{% notice info %}}
......
......@@ -33,7 +33,7 @@ cost, please see the
The easiest and fastest way to access Attic is via Globus. You can
transfer files between your computer, our clusters ($HOME, $WORK, and $COMMON on
Crane), and Attic. Here is a detailed tutorial on
Crane or Rhino), and Attic. Here is a detailed tutorial on
how to set up and use [Globus Connect]({{< relref "globus_connect" >}}). For
Attic, use the Globus Endpoint **hcc\#attic**.  Your Attic files are
located at `~, `which is a shortcut
......
......@@ -4,9 +4,9 @@ description = "How to submit jobs to HCC resources"
weight = "10"
+++
Crane and Tusker are managed by
Crane and Rhino are managed by
the [SLURM](https://slurm.schedmd.com) resource manager.  
In order to run processing on Crane or Tusker, you
In order to run processing on Crane or Rhino, you
must create a SLURM script that will run your processing. After
submitting the job, SLURM will schedule your processing on an available
worker node.
......@@ -82,10 +82,10 @@ sleep 60
Specify the real memory required per node in MegaBytes. If you
exceed this limit, your job will be stopped. Note that for you
should ask for less memory than each node actually has. For
instance, Tusker has 1TB, 512GB and 256GB of RAM per node. You may
instance, Rhino has 1TB, 512GB, 256GB, and 192GB of RAM per node. You may
only request 1000GB of RAM for the 1TB node, 500GB of RAM for the
512GB nodes, and 250GB of RAM for the 256GB nodes. For Crane, the
max is 500GB.
512GB nodes, 250GB of RAM for the 256GB nodes, and 187.5GB for the 192 nodes.
For Crane, the max is 500GB.
- **job-name**
The name of the job.  Will be reported in the job listing.
- **partition**
......
......@@ -20,7 +20,7 @@ the following instructions to work.**
- [Tutorial Video](#tutorial-video)
Every HCC user has a password that is same on all HCC machines
(Crane, Anvil). This password needs to satisfy the HCC
(Crane, Rhino, Anvil). This password needs to satisfy the HCC
password requirements.
### HCC password requirements
......
......@@ -4,7 +4,7 @@ description = "How to submit jobs to HCC resources"
weight = "10"
+++
Crane is managed by
Crane and Rhino are managed by
the [SLURM](https://slurm.schedmd.com) resource manager.  
In order to run processing on Crane, you
must create a SLURM script that will run your processing. After
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment