running_postgres.md

+++
title = "Running PostgreSQL"
description = "How to run a PostgreSQL server within a SLURM job"
+++

This page describes how to run a PostgreSQL server instance with a SLURM job
on HCC resources. Many software packages require the use of an SQL type database
as part of their workflows. This example shows how to start a PostgreSQL server
inside of a SLURM job on HCC resources. The database will be available as long as
the SLURM job containing it is running, and other jobs may then be submitted to
connect to and use it. The database files are stored on the clusters' filesystem
(here `$COMMON` is used), so that even when the containing SLURM job ends the data
is persistent. That is, you can submit a subsequent identical PostgreSQL server job
and data that was previously imported in to the database will still be there.

{{% notice warning %}}
Only **one** instance of the database server job can run at a time. Submitting multiple
server jobs simultaneously will result in undefined behavior and database corruption.
{{% /notice %}}

### Initial setup steps

A few initial setup commands must be run first. These commands only need be run once for
each database instance you wish to run. The commands should be run from the _login node_.

First, choose a location to hold the database and configuration files. Here, we use a
folder named `postgres` in the `$COMMON` directory. Change the value of `POSTGRES_HOME`
if you wish to use another location.  Run the following commands to create the needed directory 
structure and create a random password for the database that is stored in `$POSTGRES_HOME/config/postgres-password`.

{{< highlight bash >}}
$ export POSTGRES_HOME=$COMMON/postgres
$ mkdir -p $POSTGRES_HOME/{config,db/data,run}
$ uuidgen > $POSTGRES_HOME/config/postgres-password
$ chmod 600 $POSTGRES_HOME/config/postgres-password
{{< /highlight >}}

### Start the PostgreSQL SLURM job

Use the following submit script to start the job for the database:

{{% panel theme="info" header="postgres.submit" %}}
{{< highlight bash >}}
#!/bin/bash
#SBATCH --time=168:00:00
#SBATCH --mem=8gb
#SBATCH --job-name=postgres_server
#SBATCH --error=postgres_server.err
#SBATCH --output=postgres_server.out
#SBATCH --licenses=common
#SBATCH --dependency=singleton
#SBATCH --signal=B:SIGINT@60

export POSTGRES_HOME=$COMMON/postgres
export POSTGRES_PASSWORD_FILE=$POSTGRES_HOME/config/postgres-password
export POSTGRES_USER=$USER
export POSTGRES_DB=mydb
export PGDATA=$POSTGRES_HOME/db/data
export POSTGRES_HOST_AUTH_METHOD=md5
export POSTGRES_INITDB_ARGS="--data-checksums"
export POSTGRES_PORT=$(shuf -i 2000-65000 -n 1)

echo "Postgres server running on $(hostname) on port $POSTGRES_PORT"
echo "This job started at $(date +%Y-%m-%dT%T)"
echo "This job will end at $(squeue --noheader -j $SLURM_JOBID -o %e) (in $(squeue --noheader -j $SLURM_JOBID -o %L))"
module load singularity
exec singularity run -B $POSTGRES_HOME/db:/var/lib/postgresql -B $POSTGRES_HOME/run:/var/run/postgresql docker://postgres:11 -c "port=$POSTGRES_PORT"
{{< /highlight >}}
{{% /panel %}}

This script starts a PostgreSQL server instance with the following properties:

- The superuser username is set to your HCC username and the password is the random one generated earlier.
- The server is started on a random port to avoid collisions.
- The database name is `mydb`. This can be changed to whatever name you would like (some applications may require a specific name).
- Checksums on data pages are enabled to help detect corruption.
- Password authentication is required for security.

Additionally, the job is run with `--dependency=singleton` to ensure that only one instance (based on job name) is running
at a time. Duplicate jobs submitted afterwards will queue until an earlier job exits. The `--signal=B:SIGINT@60` option 
instructs SLURM to send a shutdown signal to the PostgreSQL server 60 seconds before the time limit of the job. This
will help to avoid corruption by allowing the server to perform a graceful shutdown.

Once the job starts, check the `postgres_server.out` file for information on which host and port the server is listening on. For example,

{{< highlight bash >}}
Postgres server running on c1725.crane.hcc.unl.edu on port 10332
This job started at 2020-06-19T10:20:58
This job will end at 2020-06-19T10:50:57 (in 29:59)
{{< /highlight >}}

Here, the server is running on host `c1725.crane.hcc.unl.edu` on port 10332. 
The output also contains information on when the job will end. This can be useful when submitting
the companion analysis job(s) that will use the database. It is recommended to adjust the requested walltime of
the analysis job(s) to ensure they will end _before_ the database job does.

### Accessing the PostgreSQL instance

The server instance can be accessed using the hostname and port from the job output, as well as your HCC username
and the random password set initially. The exact method will depend on your application.  Take care to treat the
password in a secure manner.

### Restarting the PostgreSQL instance

To restart the server, simply resubmit the same SLURM job as above. The first time the job is run, PostgreSQL
will create the database from scratch.  Subsequent runs will detect an existing database and will not
overwrite it. Data entered into the database from previous runs will be available.

{{% notice info %}}
Each new instance of the server will run on a different host and port. You will need to update these
values before submitting subsequent analysis jobs.
{{% /notice %}}

### Submitting jobs that require PostgreSQL

The simplest way to manage jobs that need the database is to manually submit them after the PostgreSQL SLURM job
has started. However, this is not terribly convenient. A better way is to use the dependency feature of SLURM. 
Submit the PostgreSQL job first and make a note of the job id. In the submit script(s) of the analysis jobs, add the line

{{< highlight batch >}}
#SBATCH --dependency=after:<job id>
{{< /highlight >}}

replacing `<job id>` with the numeric job id noted before. This will instruct SLURM to only begin running
the analysis job(s) once the database job has begun.