job_dependencies.md 4.57 KB
Newer Older
Adam Caprez's avatar
Adam Caprez committed
1
2
3
+++
title = "Job Dependencies"
description =  "How to use job dependencies with the SLURM scheduler."
4
weight=55
Adam Caprez's avatar
Adam Caprez committed
5
+++
6
7
8
9
10
11
12
13
14

The job dependency feature of SLURM is useful when you need to run
multiple jobs in a particular order.  A standard example of this is a
workflow in which the output from one job is used as the input to the
next.  Rather than continually check to see if one job has ended and
then manually submit the next, all the jobs in the workflow can be
submitted at once.  SLURM will then run them in the proper order based
on the conditions supplied.  

Adam Caprez's avatar
Adam Caprez committed
15
### Syntax
16

Adam Caprez's avatar
Adam Caprez committed
17
The basic syntax is to include the `-d` option with the `sbatch` command
18
19
20
for a new submission to indicate it depends on another job.  You must
also supply the condition and job id upon which it depends.  SLURM
supports several possible conditions; see the `sbatch`
Adam Caprez's avatar
Adam Caprez committed
21
22
[man page](http://slurm.schedmd.com/sbatch.html)
for all the options.  The example here uses `afterok`, which instructs
23
24
25
26
27
28
29
30
SLURM to only run the submitted job after the dependency job has
terminated without error (exit code 0).

This example is usually referred to as a "diamond" workflow.  There are
4 jobs total; the jobs are labeled A through D.  Job A runs first.  Jobs
B and C both depend on Job A completing before they can run.  Job D then
depends on Jobs B and C completing.

Adam Caprez's avatar
Adam Caprez committed
31
{{< figure src="/images/4980738.png" width="400" >}}
32
33

The SLURM submit files for each step are below.
Caughlin Bohn's avatar
Caughlin Bohn committed
34
{{% panel theme="info" header="JobA.submit" %}}
Adam Caprez's avatar
Adam Caprez committed
35
{{< highlight batch >}}
Caughlin Bohn's avatar
Caughlin Bohn committed
36
#!/bin/bash
37
38
39
40
41
42
43
44
#SBATCH --job-name=JobA
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --output=JobA.stdout
#SBATCH --error=JobA.stderr
echo "I'm job A"
echo "Sample job A output" > jobA.out
sleep 120
Adam Caprez's avatar
Adam Caprez committed
45
{{< /highlight >}}
Caughlin Bohn's avatar
Caughlin Bohn committed
46
{{% /panel %}}
47
48


Caughlin Bohn's avatar
Caughlin Bohn committed
49
{{% panel theme="info" header="JobB.submit" %}}
Adam Caprez's avatar
Adam Caprez committed
50
{{< highlight batch >}}
Caughlin Bohn's avatar
Caughlin Bohn committed
51
#!/bin/bash
52
53
54
55
56
57
58
59
60
61
62
#SBATCH --job-name=JobB
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --output=JobB.stdout
#SBATCH --error=JobB.stderr
echo "I'm job B"
echo "I'm using output from job A"
cat jobA.out >> jobB.out
echo "" >> jobB.out
echo "Sample job B output" >> jobB.out
sleep 120
Adam Caprez's avatar
Adam Caprez committed
63
{{< /highlight >}}
Caughlin Bohn's avatar
Caughlin Bohn committed
64
{{% /panel %}}
65

Caughlin Bohn's avatar
Caughlin Bohn committed
66
{{% panel theme="info" header="JobC.submit" %}}
Adam Caprez's avatar
Adam Caprez committed
67
{{< highlight batch >}}
Caughlin Bohn's avatar
Caughlin Bohn committed
68
#!/bin/bash
69
70
71
72
73
74
75
76
77
78
79
#SBATCH --job-name=JobC
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --output=JobC.stdout
#SBATCH --error=JobC.stderr
echo "I'm job C"
echo "I'm using output from job A"
cat jobA.out >> jobC.out
echo "" >> jobC.out
echo "Sample job C output" >> jobC.out
sleep 120
Adam Caprez's avatar
Adam Caprez committed
80
{{< /highlight >}}
Caughlin Bohn's avatar
Caughlin Bohn committed
81
{{% /panel %}}
82

Caughlin Bohn's avatar
Caughlin Bohn committed
83
{{% panel theme="info" header="JobD.submit" %}}
Adam Caprez's avatar
Adam Caprez committed
84
{{< highlight batch >}}
Caughlin Bohn's avatar
Caughlin Bohn committed
85
#!/bin/bash
86
87
88
89
90
91
92
93
94
95
96
97
98
#SBATCH --job-name=JobD
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --output=JobD.stdout
#SBATCH --error=JobD.stderr
echo "I'm job D"
echo "I'm using output from jobs B and C"
cat jobB.out >> jobD.out
echo "" >> jobD.out
cat jobC.out >> jobD.out
echo "" >> jobD.out
echo "Sample job D output" >> jobD.out
sleep 120
Adam Caprez's avatar
Adam Caprez committed
99
{{< /highlight >}}
Caughlin Bohn's avatar
Caughlin Bohn committed
100
{{% /panel %}}
101
102
103

To start the workflow, submit Job A first:

Adam Caprez's avatar
Adam Caprez committed
104
105
{{% panel theme="info" header="Submit Job A" %}}
{{< highlight batch >}}
106
[demo01@login.crane demo01]$ sbatch JobA.submit
107
Submitted batch job 666898 
Adam Caprez's avatar
Adam Caprez committed
108
109
{{< /highlight >}}
{{% /panel %}}
110
111
112
113

Now submit jobs B and C, using the job id from Job A to indicate the
dependency:

Adam Caprez's avatar
Adam Caprez committed
114
115
{{% panel theme="info" header="Submit Jobs B and C" %}}
{{< highlight batch >}}
116
[demo01@login.crane demo01]$ sbatch -d afterok:666898 JobB.submit
117
Submitted batch job 666899
118
[demo01@login.crane demo01]$ sbatch -d afterok:666898 JobC.submit
119
Submitted batch job 666900
Adam Caprez's avatar
Adam Caprez committed
120
121
{{< /highlight >}}
{{% /panel %}}
122
123
124

Finally, submit Job D as depending on both jobs B and C:

Adam Caprez's avatar
Adam Caprez committed
125
126
{{% panel theme="info" header="Submit Job D" %}}
{{< highlight batch >}}
127
[demo01@login.crane demo01]$ sbatch -d afterok:666899:666900 JobD.submit
128
Submitted batch job 666901
Adam Caprez's avatar
Adam Caprez committed
129
130
{{< /highlight >}}
{{% /panel %}}
131
132
133
134
135

Running `squeue` will now show all four jobs.  The output from `squeue`
will also indicate that Jobs B, C, and D are in a pending state because
of the dependency.

Adam Caprez's avatar
Adam Caprez committed
136
137
{{% panel theme="info" header="Squeue Output" %}}
{{< highlight batch >}}
138
[demo01@login.crane demo01]$ squeue -u demo01
139
140
141
142
143
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            666899     batch     JobB   demo01 PD       0:00      1 (Dependency)
            666900     batch     JobC   demo01 PD       0:00      1 (Dependency)
            666901     batch     JobD   demo01 PD       0:00      1 (Dependency)
            666898     batch     JobA   demo01  R       0:52      1 c2409
Adam Caprez's avatar
Adam Caprez committed
144
145
{{< /highlight >}}
{{% /panel %}}
146
147
148

As the each job completes successfully, SLURM will run the job(s) in the
workflow as resources become available.