diff --git a/.DS_Store b/.DS_Store new file mode 100644 index 0000000000000000000000000000000000000000..2a43b86b965c6735a338394c4d17e6a7a5a71078 Binary files /dev/null and b/.DS_Store differ diff --git a/content/NRP/_index.md b/content/NRP/_index.md new file mode 100644 index 0000000000000000000000000000000000000000..03a6431b2b0bb96c3eea98d4eb2c99d1e0b6bd22 --- /dev/null +++ b/content/NRP/_index.md @@ -0,0 +1,31 @@ ++++ +title = "The National Research Platform" +description = "How to utilize the National Research Platform (NRP)." +weight = "90" ++++ + +### What is the National Research Platform (NRP)? + +The [National Research Platform](https://nationalresearchplatform.org) is a partnership of more than 50 institutions, led by researchers at UC San Diego, University of Nebraska-Lincoln, and UC Berkeley and includes the National Science Foundation, Department of Energy, and multiple research universities in the US and around the world. + +The major resource of NRP is a heterogenous globally distributed, open system that fetures a variety of CPUs, GPUs and storage, arranged into a Kubernetes cluster called [Nautilus](https://docs.pacificresearchplatform.org). + +The map below shows the National Research Platform resources located across the world. + +<iframe + src="https://elastic-igrok.nrp-nautilus.io/app/dashboards/?auth_provider_hint=anonymous1#/view/76b9b030-81d5-11eb-ad7c-1f5ec373b923?embed=true&_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-1d,to:now))&hide-filter-bar=true" + style="width:100%; height:600px;" +></iframe> + +This help document covers these topics: + +- [Quick Start]({{< relref "quick_start">}}) +- [Basic Kubernetes]({{< relref "basic_kubernetes">}}) +- [GPU Pods]({{< relref "gpu_pods">}}) +- [Batch Jobs]({{< relref "batch_jobs">}}) +- [Deployments]({{< relref "deployments">}}) +- [Storage]({{< relref "storage">}}) +- [JupyterHub Services]({{<relref "jupyterhub">}}) + +The full documentation of the NRP Nautilus Cluster can be found at https://docs.pacificresearchplatform.org. +To get help regarding using the NRP Nautilus Cluster, please refer to the [Contact page](https://docs.pacificresearchplatform.org/userdocs/start/contact/) diff --git a/content/NRP/basic_kubernetes.md b/content/NRP/basic_kubernetes.md new file mode 100644 index 0000000000000000000000000000000000000000..5d7b2e7006762e05bf1245c8caf77d56e161c0aa --- /dev/null +++ b/content/NRP/basic_kubernetes.md @@ -0,0 +1,306 @@ ++++ +title = "Basic Kubernetes" +description = "Basic Kubernetes" +weight=20 ++++ + +### Setup + +This section assumes you've completed the [Quick Start]({{< ref "quick_start.md">}}) section. + +If you are in multiple namespaces, you need to be aware of which namespace you’re working in, and either set it with `kubectl config set-context nautilus --namespace=the_namespace` or specify in each `kubectl` command by adding `-n namespace`. + +### Explore the system + +To get the list of cluster nodes (although you may not have access to all of them), type: + +``` +kubectl get nodes +``` + +Right now you probably don't have anything running in the namespace, and these commands will return `No resources found in ... namespace.`. There are three categories we will examine: pods, deployments and services. Later these commands will be useful to see what's running: + +List all the pods in your namespace + +``` +kubectl get pods +``` + +List all the deployments in your namespace + +``` +kubectl get deployments +``` + +List all the services in your namespace + +``` +kubectl get services +``` + +### Launch a simple pod + +Let’s create a simple generic pod, and login into it. + +You can copy-and-paste the lines below. Create the `pod1.yaml` file with the following content: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: test-pod +spec: + containers: + - name: mypod + image: ubuntu + resources: + limits: + memory: 100Mi + cpu: 100m + requests: + memory: 100Mi + cpu: 100m + command: ["sh", "-c", "echo 'Im a new pod' && sleep infinity"] +``` + +Reminder, indentation is important in YAML, just like in Python. + +*If you don't want to create the file and are using Mac or Linux, you can create yaml's dynamically like this:* + +``` +kubectl create -f - << EOF +<contents you want to deploy> +EOF +``` + +Now let’s start the pod: + +``` +kubectl create -f pod1.yaml +``` + +See if you can find it: + +``` +kubectl get pods +``` + +Note: You may see the other pods too. + +If it is not yet in Running state, you can check what is going on with + +``` +kubectl get events --sort-by=.metadata.creationTimestamp +``` + +Events and other useful information about the pod can be seen in `describe`: + +``` +kubectl describe pod test-pod +``` + +If the pod is in Running state, we can check its logs + +``` +kubectl logs test-pod +``` + +Let’s log into it + +``` +kubectl exec -it test-pod -- /bin/bash +``` + +You are now inside the (container in the) pod! + +Does it feel any different than a regular, dedicated node? + +Try to create some directories and some files with content. + +(Hello world will do, but feel free to be creative) + +We will want to check the status of the networking. + +But ifconfig is not available in the image we are using; so let’s install it. + +First, let's make sure our installation tools are updated. + +``` +apt update +``` +Now, we can use apt to install the necessary network tools. + +``` +apt install net-tools +``` +Now check the networking: + +``` +ifconfig -a +``` + +Get out of the Pod (with either Control-D or exit). + +You should see the same IP displayed with kubectl + +``` +kubectl get pod -o wide test-pod +``` + +We can now destroy the pod + +``` +kubectl delete -f pod1.yaml +``` + +Check that it is actually gone: + +``` +kubectl get pods +``` + +Now, let’s create it again: + +``` +kubectl create -f pod1.yaml +``` + +Does it have the same IP? + +``` +kubectl get pod -o wide test-pod +``` + +Log back into the pod: + +``` +kubectl exec -it test-pod -- /bin/bash +``` + +What does the network look like now? + +What is the status of the files your created? + +Finally, let’s delete the pod explicitly: + +``` +kubectl delete pod test-pod +``` + +### Let’s make it a deployment + +You saw that when a pod was terminated, it was gone. + +While above we did it by ourselves, the result would have been the same if a node died or was restarted. + +In order to gain a higher availability, the use of Deployments is recommended. So, that’s what we will do next. + +You can copy-and-paste the lines below. + +###### dep1.yaml: + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: test-dep + labels: + k8s-app: test-dep +spec: + replicas: 1 + selector: + matchLabels: + k8s-app: test-dep + template: + metadata: + labels: + k8s-app: test-dep + spec: + containers: + - name: mypod + image: ubuntu + resources: + limits: + memory: 500Mi + cpu: 500m + requests: + memory: 100Mi + cpu: 50m + command: ["sh", "-c", "sleep infinity"] +``` + +Now let’s start the deployment: + +``` +kubectl create -f dep1.yaml +``` + +See if you can find it: + +``` +kubectl get deployments +``` + +The Deployment is just a conceptual service, though. + +See if you can find the associated pod: + +``` +kubectl get pods +``` + +Once you have found its name, let’s log into it + +``` +kubectl get pod -o wide test-dep-<hash> +kubectl exec -it test-dep-<hash> -- /bin/bash +``` + +You are now inside the (container in the) pod! + +Create directories and files as before. + +Try various commands as before. + +Let’s now delete the pod! + +``` +kubectl delete pod test-dep-<hash> +``` + +Is it really gone? + +``` +kubectl get pods +``` + +What happened to the deployment? + +``` +kubectl get deployments +``` + +Get into the new pod + +``` +kubectl get pod -o wide test-dep-<hash> +kubectl exec -it test-dep-<hash> -- /bin/bash +``` + +Was anything preserved? + +Let’s now delete the deployment: + +``` +kubectl delete -f dep1.yaml +``` + +Verify everything is gone: + +``` +kubectl get deployments +kubectl get pods +``` + +### More tutorials are available at [Nautilus Documentation - Tutorials](https://docs.pacificresearchplatform.org) + diff --git a/content/NRP/batch_jobs.md b/content/NRP/batch_jobs.md new file mode 100644 index 0000000000000000000000000000000000000000..803d5735f290dfd32e4688f067315cf9d9183baf --- /dev/null +++ b/content/NRP/batch_jobs.md @@ -0,0 +1,139 @@ ++++ +title = "Batch Jobs" +description = "Batch Jobs" +weight=40 ++++ + +### Running batch jobs + +#### Basic example +Kubernetes has a support for running batch jobs. A Job is a daemon which watches your pod and makes sure it exited with exit status 0. If it did not for any reason, it will be restarted up to `backoffLimit` number of times. + +Since jobs in Nautilus are not limited in runtime, you can only run jobs with meaningful `command` field. Running in manual mode (`sleep infinity` `command` and manual start of computation) is prohibited. + +Let's run a simple job and get its result. + +Create a job.yaml file and submit: + +```yaml +apiVersion: batch/v1 +kind: Job +metadata: + name: pi +spec: + template: + spec: + containers: + - name: pi + image: perl + command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"] + resources: + limits: + memory: 200Mi + cpu: 1 + requests: + memory: 50Mi + cpu: 50m + restartPolicy: Never + backoffLimit: 4 +``` + +Explore what's running: + +``` +kubectl get jobs +kubectl get pods +``` + +When the job is finished, your pod will stay in Completed state, and Job will have COMPLETIONS field 1 / 1. For long jobs, the pods can have Error, Evicted, and other states until they finish properly or backoffLimit is exhausted. + +This example job did not use any storage and outputted the result to STDOUT, which can be seen as our pod logs: + +``` +kubectl logs pi-<hash> +``` + +The pod and job will remain for you to come and look at for `ttlSecondsAfterFinished=604800` seconds (1 week) by default, and you can adjust this value in your job definition if desired. + +**Please make sure you did not leave any pods and jobs behind.** To delete the job, run + +``` +kubectl delete job pi +``` + +#### Running several bash commands + + +You can group several commands, and use pipes, like this: + +``` + command: + - sh + - -c + - "cd /home/user/my_folder && apt-get install -y wget && wget pull some_file && do something else" +``` + +#### Logs + + +All stdout and stderr outputs from the script will be preserved and accessible by running + +``` +kubectl logs pod_name +``` + +Output from initContainer can be seen with +``` +kubectl logs pod_name -c init-clone-repo +``` + +To see logs in real time do: +``` +kubectl logs -f pod_name +``` + +The pod will remain in Completed state until you delete it or timeout is passed. + +#### Retries + +The backoffLimit field specifies how many times your pod will run in case the exit status of your script is not 0 +or if pod was terminated for a different reason (for example a node was rebooted). It's a good idea to have it more than 0. + +#### Fair queueing + +There is no fair queue implemented on Nautilus. If you submit 1000 jobs, you block **all** other users from submitting in the cluster. + +To limit your submission to a fair portion of the cluster, refer to [this guide](https://kubernetes.io/docs/tasks/job/fine-parallel-processing-work-queue/). Make sure to use a deployment and persistent storage for Redis pod. Here's [our example](https://gitlab.nrp-nautilus.io/prp/job-queue/-/blob/master/redis.yaml) + +#### CPU only jobs + +Nautilus is primarily used for GPU jobs. While it's possible to run large CPU-only jobs, you have to take certain measures to prevent taking over all cluster resources. + +You can run the jobs with lower priority and allow other jobs to preempt yours. This way you should not worry about the size of your jobs and you can use the maximum number of resources in the cluster. To do that, add the `opportunistic` priority class to your pods: + +```yaml + spec: + priorityClassName: opportunistic +``` + +Another thing to do is to avoid the GPU nodes. This way you can be sure you're only using the CPU-only nodes and jobs are not preventing any GPU usage. To do this, add the node antiaffinity for GPU device to your pod: + +```yaml + spec: + affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: feature.node.kubernetes.io/pci-10de.present + operator: NotIn + values: + - "true" +``` + +You can use a combination of 2 methods or either one. + + + + + diff --git a/content/NRP/deployments.md b/content/NRP/deployments.md new file mode 100644 index 0000000000000000000000000000000000000000..5d055d9ebc768ab06f0fa9949701825cf1a8e72b --- /dev/null +++ b/content/NRP/deployments.md @@ -0,0 +1,80 @@ ++++ +title = "Deployments" +description = "Deployments" +weight=50 ++++ + +## Running an idle deployment + +In case you need to have an idle pod in the cluster, that might ocassionally do some computations, you have to run it as a [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/). Deployments in Nautilus are limited to 2 weeks (unless the namespace is added to exceptions list and runs a permanent service). This ensures your pod will not run in the cluster forever when you don't need it and move on to other projects. + +Please don't run such pods as Jobs, since those are not purged by the cleaning daemon and will stay in the cluster forever if you forget to remove those. + +Such a deployment **can not request a GPU**. You can use the + +``` +command: + - sleep + - "100000000" +``` + +as the command if you just want a pure shell, and `busybox`, `centos`, `ubuntu` or any other general image you like. + +Follow the [guide for creating deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) and add the minimal requests to it and limits that make sense, for example: + +``` +resources: + limits: + cpu: "1" + memory: 10Gi + requests: + cpu: "10m" + memory: 100Mi +``` + +Example of running an nginx deployment: + +``` +apiVersion: apps/v1 +kind: Deployment +metadata: + name: nginx-deployment + labels: + k8s-app: nginx +spec: + replicas: 1 + selector: + matchLabels: + k8s-app: nginx + template: + metadata: + labels: + k8s-app: nginx + spec: + containers: + - image: nginx + name: nginx-pod + resources: + limits: + cpu: 1 + memory: 4Gi + requests: + cpu: 100m + memory: 500Mi +``` + +## Quickly stopping and starting the pod + +If you need a simple way to start and stop your pod without redeploying every time, you can scale down the deployment. This will leave the definition, but delete the pod. + +To stop the pod, scale down: + +``` +kubectl scale deployment deployment-name --replicas=0 +``` + +To start the pod, scale up: + +``` +kubectl scale deployment deployment-name --replicas=1 +``` diff --git a/content/NRP/gpu_pods.md b/content/NRP/gpu_pods.md new file mode 100644 index 0000000000000000000000000000000000000000..86050d34334d453d48b21f90cbd917ea2bcba33d --- /dev/null +++ b/content/NRP/gpu_pods.md @@ -0,0 +1,161 @@ ++++ +title = "GPU Pods" +description = "GPU Pods" +weight=20 ++++ + +The Nautilus Cluster provides over 200 GPU nodes. In this section you will request GPUs. Make sure you don't waste those and delete your pods when not using the GPUs. + +Use this definition to create your own pod and deploy it to kubernetes \(refer to [Basic Kubernetes]({{< ref "basic_kubernetes.md">}})\): + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: gpu-pod-example +spec: + containers: + - name: gpu-container + image: gitlab-registry.nrp-nautilus.io/prp/jupyter-stack/prp:latest + command: ["sleep", "infinity"] + resources: + limits: + nvidia.com/gpu: 1 +``` + +This example requests 1 GPU device. You can have up to 2 for pods. If you request GPU devices in your pod, +kubernetes will auto schedule your pod to the appropriate node. There's no need to specify the location manually. + +**You should always delete your pod** when your computation is done to let other users use the GPUs. +Consider using [Jobs](/userdocs/running/jobs/) **with actual script instead of `sleep`** whenever possible to ensure your pod is not wasting GPU time. +If you have never used Kubernetes before, see the [tutorial](/userdocs/tutorial/intro). + +#### Requesting high-demand GPUs + + +Certain kinds of GPUs have much higher specs than the others, and to avoid wasting those for regular jobs, your pods will only be scheduled on those if you request the type explicitly. + +Currently those include: + +* NVIDIA-TITAN-RTX +* NVIDIA-RTX-A5000 +* Quadro-RTX-6000 +* Tesla-V100-SXM2-32GB +* NVIDIA-A40 +* NVIDIA-RTX-A6000 +* Quadro-RTX-8000 +* NVIDIA-A100-SXM4-80GB* + +*A100 running in [MIG mode](#mig-mode) is not considered high-demand one. + +#### Requesting many GPUs + + +Since 1 and 2 GPU jobs are blocking nodes from getting 4 and 8 GPU jobs, there are some nodes reserved for those. Once you submit a job requesting 4 or 8 GPUs, a controller will automatically add toleration which will allow you to use the node reserved for more GPUs. You don't need to do anything manually for that. + +#### Choosing GPU type + +We have a variety of GPU flavors attached to Nautilus. You can get a list of GPU models from the actual cluster information (f.e. `kubectl get nodes -L nvidia.com/gpu.product`). + +<div id="observablehq-chart-35acf314"></div> +<p>Credit: <a href="https://observablehq.com/d/7c0f46855b4212e0">GPU types by NRP Nautilus</a></p> + +<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/@observablehq/inspector@5/dist/inspector.css"> + +<script type="module"> +import {Runtime, Inspector} from "https://cdn.jsdelivr.net/npm/@observablehq/runtime@5/dist/runtime.js"; +import define from "https://api.observablehq.com/d/7c0f46855b4212e0.js?v=4"; +new Runtime().module(define, name => { + if (name === "chart") return new Inspector(document.querySelector("#observablehq-chart-35acf314")); +}); +</script> + +If you need more graphical memory, use the official specs to choose the type. The table below is an example of the GPU types in the Nautuilus Cluster and their memory size: + +GPU Type | Memory size (GB) +---|--- +NVIDIA-GeForce-GTX-1070 | 8G +NVIDIA-GeForce-GTX-1080 | 8G +Quadro-M4000 | 8G +NVIDIA-A100-PCIE-40GB-MIG-2g.10gb | 10G +NVIDIA-GeForce-GTX-1080-Ti | 12G +NVIDIA-GeForce-RTX-2080-Ti | 12G +NVIDIA-TITAN-Xp | 12G +Tesla-T4 | 16G +NVIDIA-A10 | 24G +NVIDIA-GeForce-RTX-3090 | 24G +NVIDIA-GeForce-RTX-3090 | 24G +NVIDIA-TITAN-RTX | 24G +NVIDIA-RTX-A5000 | 24G +Quadro-RTX-6000 | 24G +Tesla-V100-SXM2-32GB | 32G +NVIDIA-A40 | 48G +NVIDIA-RTX-A6000 | 48G +Quadro-RTX-8000 | 48G + +**NOTE**: [Not all nodes are available to all users](https://docs.pacificresearchplatform.org/userdocs/running/special/). You can consult about your available resources in [Matrix](https://docs.pacificresearchplatform.org/userdocs/start/contact) and on [resources page](https://portal.nrp-nautilus.io/resources). +Labs connecting their hardware to our cluster have preferential access to all our resources. + +To use a **specific type of GPU**, add the affinity definition to your pod yaml +file. The example below specifies *1080Ti* GPU: +```yaml +spec: + affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: nvidia.com/gpu.product + operator: In + values: + - NVIDIA-GeForce-GTX-1080-Ti +``` + +**To make sure you did everything correctly** after you've submitted the job, look at the corresponding pod yaml (`kubectl get pod ... -o yaml`) and check that resulting nodeAffinity is as expected. + +#### Selecting CUDA version + +In general the higher CUDA versions support the lower and same driver versions. The nodes are labelled with the major and minor CUDA and driver versions. You can check those at the [resources page](https://portal.nrp-nautilus.io/resources) or list with this command (it will also choose only GPU nodes): + +```bash +kubectl get nodes -L nvidia.com/cuda.driver.major,nvidia.com/cuda.driver.minor,nvidia.com/cuda.runtime.major,nvidia.com/cuda.runtime.minor -l nvidia.com/gpu.product +``` + +If you're using the container image with higher CUDA version, you have to pick the nodes supporting it. Example: + +```yaml +spec: + affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: nvidia.com/cuda.runtime.major + operator: In + values: + - "12" + - key: nvidia.com/cuda.runtime.minor + operator: In + values: + - "2" +``` + +Also you can choose the driver above something if you know which one you need (this will pick drivers **above** 535): + +```yaml +spec: + affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: nvidia.com/cuda.driver.major + operator: Gt + values: + - "535" +``` + +#### MIG mode + +A100 GPUs allow slicing those into several logical GPUs ( [MIG mode](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#a100-profiles) ). This mode is enabled in our cluster. Things can change, but currently we're thinking about slicing those in halves. The current MIG mode can be obtained from nodes via the `nvidia.com/gpu.product` label: `NVIDIA-A100-PCIE-40GB-MIG-2g.10gb` means 2 compute instances (out of 7 total) and 10GB memory per virtual GPU. + diff --git a/content/NRP/jupyterhub.md b/content/NRP/jupyterhub.md new file mode 100644 index 0000000000000000000000000000000000000000..698e6bfca2a31212414b9c2794c4d839d3921975 --- /dev/null +++ b/content/NRP/jupyterhub.md @@ -0,0 +1,25 @@ ++++ +title = "JupyterHub Service" +description = "JupyterHub Service" +weight=70 ++++ + +### [JupyterHub](https://jupyterhub-west.nrp-nautilus.io) on Nautilus + +[JupyterHub](https://jupyterhub-west.nrp-nautilus.io) service is provided on the Nautilus Cluster, which is great +if you need to quickly run your workflow and do not want to learn any +kubernetes. Simply follow the link to [https://jupyterhub-west.nrp-nautilus.io](https://jupyterhub-west.nrp-nautilus.io), click the **Sign in With CILogon** button, and use your institutional credentials to login using CILogon. After authentication, choose the hardware specs to spawn your instance. An example of the specs selection is shown as below: + +{{<figure src="/images/nrp-jupyterhub-options.png">}} + +Your persistent home folder initially will be limited to 5GB. If you need more, you can request it to be extended. +You can also request for [cephFS storage](https://docs.pacificresearchplatform.org/userdocs/storage/ceph/) that is mounted to a shared disk space. All these requests can be made by **contacting NRP admins through [Matrix](https://docs.pacificresearchplatform.org/userdocs/start/contact/)**. +Please use this to store all the data, code and results that you would need for long experiments. + + +**NOTE:** Your jupyter container will shut down 1hr after your browser disconnects from it. If you need your job to keep running, don't close the browser window. +You could either use a desktop with a persistent Internet connection or only use this for testing your code. + +**NOTE:** Available images are described in the [scientific images section](https://docs.pacificresearchplatform.org/userdocs/running/sci-img/). + +If you need to use an image that is not provided by NRP, proceed to [Step by Step Tensorflow with Jupyter](https://docs.pacificresearchplatform.org/userdocs/jupyter/jupyter-pod). If you prefer a customized JupyterHub, follow the guide to [Deploy JupyterHub](https://docs.pacificresearchplatform.org/userdocs/jupyter/jupyterhub/) to deploy your own JupyterHub instance on the Nautilus Cluster. diff --git a/content/NRP/quick_start.md b/content/NRP/quick_start.md new file mode 100644 index 0000000000000000000000000000000000000000..c4338052ca99d4e241350265e9b69c818dad6bd3 --- /dev/null +++ b/content/NRP/quick_start.md @@ -0,0 +1,63 @@ ++++ +title = "Quick Start" +description = "Quick Start" +weight=10 ++++ + +The Nautilus Cluster is a globally distrubuted [Kubernetes](https://kubernetes.io) cluster. + +The general guide of getting access to the Nautilus Cluster can be found [here](https://docs.pacificresearchplatform.org/userdocs/start/get-access/). The guidance in this page is tailored to NU users: + +### Get access to the Nautilus cluster + +1. Point your browser to the [Nautilus Portal](https://portal.nrp-nautilus.io) +2. On the portal page click on "Login" button at the top right corner + {{< figure src="/images/nautilus-portal-login.png" height="50" >}} +3. You will be redirected to the "CILogon" page +4. On this page, select "University of Nebraska-Lincoln" as the Identity Provider from the menu and Click "Log On" button to use your UNL credentials to login. For users from other NU campuses, select the institution of the NU system that you are affilicated with. + {{< figure src="/images/cilogon-unl.png">}} +5. After a successful authentication you will login on the portal. + + _On first login you become a **guest**. Any admin user can + validate your guest account, promote you to **user** and add your account to their **namespace**. You need to be assigned to at least one namespace (usually a group project but can be your new namespace)._ + + - To get access to a namespace, please contact its owner (usually email). Once you are granted the user role in the cluster and are added to the namespace, you will get access to all namespace resources. + + - If you're starting a new project and would like to have your own namespace, either for yourself or for your group, you can request to be promoted to the admin by **contacting NRP admins through [Matrix](https://docs.pacificresearchplatform.org/userdocs/start/contact/)**. + + This will give you permission to create any number of namespaces and invite other users to your namespace(s). Please note, **you'll be the one responsible for all activity happening in your namespaces**. + + +6. Once you are made either a user or admin of a namespace, you'll need to accept the **Acceptable Use Policy (AUP)** on the portal page \(as shown in the screenshot below\) in order to get access to the cluster. +{{< figure src="/images/nrp-aup.png" height="50" >}} + +7. Please review [Policies](https://docs.pacificresearchplatform.org/userdocs/start/policies/) before starting any work on the Nautilus Cluster. + +### Configure a client to use the Nautilus Cluster + +Now you have been given access to the Nautilus Cluster. To interact with the cluster, you need to configure a client with the `kubectl` command-line tool. A client can be your desktop or laptop computer, a virtual machine, or a terminal environment. + +1. [Install][1] the kubectl tool + +2. Login to [NRP Nautilus portal][2] + {{< figure src="/images/nautilus-portal-login.png" height="50" >}} +3. Click the **Get Config** link on top right corner of the page to get your configuration file. + {{< figure src="/images/nrp-get-config.png" height="50" >}} + +4. Save the file as **config** and put the file in your \<home\>/.kube folder. + This folder may not exist on your machine, to create it execute from a terminal: + + ``` + mkdir ~/.kube + ``` +5. Test kubectl can connect to the cluster using a command line tool: + + ``` + kubectl get pods -n your_namespace + ``` + + It's possible there are no pods in your namespace yet. If you've got `No resources found.`, this indicates your namespace is empty and you can start running in it. + + +[1]: https://kubernetes.io/docs/tasks/tools/install-kubectl/ +[2]: https://portal.nrp-nautilus.io diff --git a/content/NRP/storage.md b/content/NRP/storage.md new file mode 100644 index 0000000000000000000000000000000000000000..7fae1eb6d1a7ba2ece6ab84153c21e59f8cdbb86 --- /dev/null +++ b/content/NRP/storage.md @@ -0,0 +1,147 @@ ++++ +title = "Storage" +description = "Storage" +weight=60 ++++ + +### Using Storage + +Different Kubernetes clusters will have different storage options available. +Let’s explore the most basic one: emptyDir. It will allocate local scratch volume, which will be gone once the pod is destroyed. + +You can copy-and-paste the lines below. + +###### strg1.yaml: + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: test-storage + labels: + k8s-app: test-storage +spec: + replicas: 1 + selector: + matchLabels: + k8s-app: test-storage + template: + metadata: + labels: + k8s-app: test-storage + spec: + containers: + - name: mypod + image: alpine + resources: + limits: + memory: 100Mi + cpu: 100m + requests: + memory: 100Mi + cpu: 100m + command: ["sh", "-c", "apk add dumb-init && dumb-init -- sleep 100000"] + volumeMounts: + - name: mydata + mountPath: /mnt/myscratch + volumes: + - name: mydata + emptyDir: {} +``` + +Now let’s start the deployment: + +``` +kubectl create -f strg1.yaml +``` + +Now log into the created pod, create + +``` +mkdir /mnt/myscratch/username +``` + +then store some files in it. + +Also put some files in some other (unrelated) directories. + +Now kill the container: `kill 1` wait for a new one to be created, then log back in. + +What happened to the files? + +You can now delete the deployment. + +### Using outer persistent storage + +In our cluster we have ceph storage connected, which allows using it for real data persistence. + +To get storage, we need to create an abstraction called PersistentVolumeClaim. By doing that we "Claim" some storage space - "Persistent Volume". There will actually be PersistentVolume created, but it's a cluster-wide resource which you can not see. + +Create the file: + +###### pvc.yaml: + +```yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: test-vol +spec: + storageClassName: rook-ceph-block + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 1Gi +``` + +We're creating a 1GB volume and formatting it with XFS. + +Look at its status with `kubectl get pvc test-vol`. The `STATUS` field should be equal to `Bound` - this indicates successful allocation. + +Now we can attach it to our pod. Create one: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: test-pod +spec: + containers: + - name: mypod + image: centos:centos7 + command: ["sh", "-c", "sleep infinity"] + resources: + limits: + memory: 100Mi + cpu: 100m + requests: + memory: 100Mi + cpu: 100m + volumeMounts: + - mountPath: /examplevol + name: examplevol + volumes: + - name: examplevol + persistentVolumeClaim: + claimName: test-vol +``` + +In volumes section we're attaching the requested persistent volume to the pod (by its name!), and in volumeMounts we're mounting the attached volume to the container in specified folder. + +### Exploring storageClasses + +Attaching persistent storage is usually done based on storage class. Different clusters will have different storageClasses, and you have to read the [documentation](https://docs.pacificresearchplatform.org/userdocs/storage/intro) on which one to use. Some are restricted and you need to contact admins to ask for permission to use those. + +Note that the one we used is the default - it will be used if you define none. + +### Cleaning up + +After you've deleted all the pods and deployments, delete the volume claim: + +``` +kubectl delete pvc test-vol +``` + +Please make sure you did not leave any running pods, deployments, volumes. + diff --git a/static/.DS_Store b/static/.DS_Store new file mode 100644 index 0000000000000000000000000000000000000000..a44044837fb71f1a57ad392a620983c0279602af Binary files /dev/null and b/static/.DS_Store differ diff --git a/static/images/cilogon-unl.png b/static/images/cilogon-unl.png new file mode 100644 index 0000000000000000000000000000000000000000..9fc649704af92bed62e3518c214af6e557cb97fb Binary files /dev/null and b/static/images/cilogon-unl.png differ diff --git a/static/images/nautilus-portal-login.png b/static/images/nautilus-portal-login.png new file mode 100644 index 0000000000000000000000000000000000000000..7fa5ce3289acd0242d731f24591d349444c23b94 Binary files /dev/null and b/static/images/nautilus-portal-login.png differ diff --git a/static/images/nrp-aup.png b/static/images/nrp-aup.png new file mode 100644 index 0000000000000000000000000000000000000000..d014fa2fb171ab1fc61bdf1de12ba4dad3202f30 Binary files /dev/null and b/static/images/nrp-aup.png differ diff --git a/static/images/nrp-get-config.png b/static/images/nrp-get-config.png new file mode 100644 index 0000000000000000000000000000000000000000..4319ddf50def3d35f7624bf972e9cadd8b49ba75 Binary files /dev/null and b/static/images/nrp-get-config.png differ diff --git a/static/images/nrp-jupyterhub-options.png b/static/images/nrp-jupyterhub-options.png new file mode 100644 index 0000000000000000000000000000000000000000..97d7727dd5ff7bffb40dd6e6ffb74103ff433251 Binary files /dev/null and b/static/images/nrp-jupyterhub-options.png differ