submitting_r_jobs.md 8.79 KB
Newer Older
Adam Caprez's avatar
Adam Caprez committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
+++
title = "Submitting R Jobs"
description =  "How to submit R jobs on HCC resources."
+++

Submitting an R job is very similar to submitting a serial job shown
on [Submitting Jobs]({{< relref "/guides/submitting_jobs/_index.md" >}}).

- [Running R scripts in batch](#running-r-scripts-in-batch)
  - [Running R scripts using `R CMD BATCH`](#running-r-scripts-using-r-cmd-batch)
  - [Running R scripts using `Rscript`](#running-r-scripts-using-rscript)
- [Multicore (parallel) R submission](#multicore-parallel-r-submission)
-  [Multinode R submission with Rmpi](#multinode-r-submission-with-rmpi)
- [Adding packages](#adding-packages)
  - [Installing packages interactively](#installing-packages-interactively)
  - [Installing packages using R CMD INSTALL](#installing-packages-using-r-cmd-install)
17
18

  
Adam Caprez's avatar
Adam Caprez committed
19
### Running R scripts in batch
20

Adam Caprez's avatar
Adam Caprez committed
21
22
23
There are two primary commands to use when submitting R scripts: `Rscript`
and `R CMD BATCH`. Both commands will execute the passed script but
differ in the way they process output.
24

Adam Caprez's avatar
Adam Caprez committed
25
#### Running R scripts using `R CMD BATCH`
26

Adam Caprez's avatar
Adam Caprez committed
27
When utilizing `R CMD BATCH` all output will be directed to an `.Rout`
28
file named after your script unless otherwise specified. For
Adam Caprez's avatar
Adam Caprez committed
29
example:
30

Adam Caprez's avatar
Adam Caprez committed
31
32
33
34
35
36
{{% panel theme="info" header="serial_R.submit" %}}
{{< highlight batch >}}
#!/bin/sh
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
37

Adam Caprez's avatar
Adam Caprez committed
38
module load R/3.5
39
R CMD BATCH Rcode.R
Adam Caprez's avatar
Adam Caprez committed
40
41
{{< /highlight >}}
{{% /panel %}}
42

Adam Caprez's avatar
Adam Caprez committed
43
44
In the above example, output for the job will be found in the file
`Rcode.Rout`. Notice that we did not specify output and error files in
45
our SLURM directives, these are not needed as all R output will go into
Adam Caprez's avatar
Adam Caprez committed
46
47
48
the `.Rout` file. To direct output to a specific location, follow your
`R CMD BATCH` command with the name of the file where you want output
directed to, as follows:
49
50


Adam Caprez's avatar
Adam Caprez committed
51
52
53
54
55
56
{{% panel theme="info" header="serial_R.submit" %}}
{{< highlight batch >}}
#!/bin/sh
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
57

Adam Caprez's avatar
Adam Caprez committed
58
module load R/3.5
59
R CMD BATCH Rcode.R Rcodeoutput.txt
Adam Caprez's avatar
Adam Caprez committed
60
61
{{< /highlight >}}
{{% /panel %}}
62

Adam Caprez's avatar
Adam Caprez committed
63
64
In this example, output from running the script `Rcode.R` will be placed
in the file `Rcodeoutput.txt`.
65

Adam Caprez's avatar
Adam Caprez committed
66
67
68
To pass arguments to the script, they need to be specified after `R CMD
BATCH` but before the script to be executed, and preferably preceded
with `--args` as follows:
69

Adam Caprez's avatar
Adam Caprez committed
70
71
72
73
74
75
{{% panel theme="info" header="serial_R.submit" %}}
{{< highlight batch >}}
#!/bin/sh
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
76

Adam Caprez's avatar
Adam Caprez committed
77
78
79
80
module load R/3.5
R CMD BATCH "--args argument1 argument2 argument3" Rcode.R Rcodeoutput.txt
{{< /highlight >}}
{{% /panel %}}
81
82


Adam Caprez's avatar
Adam Caprez committed
83
#### Running R scripts using `Rscript`
84

Adam Caprez's avatar
Adam Caprez committed
85
Using `Rscript` to execute R scripts differs from R CMD BATCH in that
86
87
88
all output and errors from the script are directed to STDOUT and STDERR
in a manner similar to other programs. This gives the user larger
control over where to direct the output. For example, to run our script
Adam Caprez's avatar
Adam Caprez committed
89
90
91
92
93
94
95
96
97
98
99
100
using `Rscript` the submit script could look like the following:

{{% panel theme="info" header="serial_R.submit" %}}
{{< highlight batch >}}
#!/bin/sh
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
#SBATCH --error=TestJob.%J.stderr
#SBATCH --output=TestJob.%J.stdout

module load R/3.5
101
Rscript Rcode.R
Adam Caprez's avatar
Adam Caprez committed
102
103
{{< /highlight >}}
{{% /panel %}}
104
105

In the above example, STDOUT will be directed to the output file
Adam Caprez's avatar
Adam Caprez committed
106
107
108
109
110
`TestJob.%J.stdout` and STDERR directed to `TestJob.%J.stderr`. You
will notice that the example is very similar to to the
[serial example]({{< relref "/guides/submitting_jobs/_index.md" >}}).
The important line is the `module load` command.
That tells the cluster to load the R framework into the environment so jobs may use it.
111

Adam Caprez's avatar
Adam Caprez committed
112
To pass arguments to the script when using `Rscript`, the arguments
113
114
will follow the script name as in the example below:

Adam Caprez's avatar
Adam Caprez committed
115
116
117
118
119
120
121
122
{{% panel theme="info" header="serial_R.submit" %}}
{{< highlight batch >}}
#!/bin/sh
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
#SBATCH --error=TestJob.%J.stderr
#SBATCH --output=TestJob.%J.stdout
123

Adam Caprez's avatar
Adam Caprez committed
124
module load R/3.5
125
Rscript Rcode.R argument1 argument2 argument3
Adam Caprez's avatar
Adam Caprez committed
126
127
{{< /highlight >}}
{{% /panel %}}
128

Adam Caprez's avatar
Adam Caprez committed
129
---
130

Adam Caprez's avatar
Adam Caprez committed
131
### Multicore (parallel) R submission
132

Adam Caprez's avatar
Adam Caprez committed
133
134
135
Submitting a multicore R job to SLURM is very similar to
[Submitting an OpenMP Job]({{< relref "submitting_an_openmp_job" >}}),
since both are running multicore jobs on a single node. Below is an example:
136

Adam Caprez's avatar
Adam Caprez committed
137
138
139
140
141
142
143
144
145
146
{{% panel theme="info" header="parallel_R.submit" %}}
{{< highlight batch >}}
#!/bin/sh
#SBATCH --ntasks-per-node=16
#SBATCH --nodes=1
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
#SBATCH --error=TestJob.%J.stdout
#SBATCH --output=TestJob.%J.stderr
147

Adam Caprez's avatar
Adam Caprez committed
148
module load R/3.5
149
R CMD BATCH Rcode.R
Adam Caprez's avatar
Adam Caprez committed
150
151
{{< /highlight >}}
{{% /panel %}}
152
153
154
155
156

The above example will submit a single job which can use up to 16 cores.

Be sure to use limits in your R code so you only use 16 cores, or your
performance will suffer.  For example, when using the
Adam Caprez's avatar
Adam Caprez committed
157
[parallel](http://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf)
158
159
package function mclapply:

Adam Caprez's avatar
Adam Caprez committed
160
161
{{% panel theme="info" header="parallel.R" %}}
{{< highlight R >}}
162
163
164
library("parallel")
...
mclapply(rep(4, 5), rnorm, mc.cores=16)
Adam Caprez's avatar
Adam Caprez committed
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
{{< /highlight >}}
{{% /panel %}}

---

### Multinode R submission with Rmpi

Submitting a multinode MPI R job to SLURM is very similar to 
[Submitting an MPI Job]({{< relref "submitting_an_mpi_job" >}}),
since both are running multicore jobs on a multiple nodes.
Below is an example of running Rmpi on Crane on 2 nodes and 32 cores:

{{% panel theme="info" header="Rmpi.submit" %}}
{{< highlight batch >}}
#!/bin/sh
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --time=00:30:00
#SBATCH --mem-per-cpu=1024
#SBATCH --job-name=TestJob
#SBATCH --error=TestJob.%J.stdout
#SBATCH --output=TestJob.%J.stderr

module load compiler/gcc/4.9 openmpi/1.10 R/3.5
export OMPI_MCA_mtl=^psm
190
mpirun -n 1 R CMD BATCH Rmpi.R
Adam Caprez's avatar
Adam Caprez committed
191
192
{{< /highlight >}}
{{% /panel %}}
193

Adam Caprez's avatar
Adam Caprez committed
194
When you run Rmpi job on Crane, please use the line `export
195
OMPI_MCA_mtl=^psm` in your submit script. Regardless of how may cores your job uses, the Rmpi package should
Adam Caprez's avatar
Adam Caprez committed
196
always be run with `mpirun -n 1` because it spawns additional
197
198
199
processes dynamically.

Please find below an example of Rmpi R script provided by
Adam Caprez's avatar
Adam Caprez committed
200
[The University of Chicago Research Computing Center](https://rcc.uchicago.edu/docs/software/environments/R/index.html#rmpi):
201

Adam Caprez's avatar
Adam Caprez committed
202
203
{{% panel theme="info" header="Rmpi.R" %}}
{{< highlight R >}}
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
library(Rmpi)

# initialize an Rmpi environment
ns <- mpi.universe.size()
mpi.spawn.Rslaves(nslaves=ns)

# send these commands to the slaves
mpi.bcast.cmd( id <- mpi.comm.rank() )
mpi.bcast.cmd( ns <- mpi.comm.size() )
mpi.bcast.cmd( host <- mpi.get.processor.name() )

# all slaves execute this command
mpi.remote.exec(paste("I am", id, "of", ns, "running on", host))

# close down the Rmpi environment
mpi.close.Rslaves(dellog = FALSE)
mpi.exit()
Adam Caprez's avatar
Adam Caprez committed
221
222
{{< /highlight >}}
{{% /panel %}}
223

Adam Caprez's avatar
Adam Caprez committed
224
225
226
---

### Adding packages
227
228
229

There are two options to install packages. The first is to run R on the
login node and run R interactively to install packages. The second is to
Adam Caprez's avatar
Adam Caprez committed
230
use the `R CMD INSTALL` command.
231

Adam Caprez's avatar
Adam Caprez committed
232
{{% notice info %}}
233
234
235
All R packages must be installed from the login node. R libraries are
stored in user's home directories which are not writable from the worker
nodes.
Adam Caprez's avatar
Adam Caprez committed
236
{{% /notice %}}
237
238
239

#### Installing packages interactively

Adam Caprez's avatar
Adam Caprez committed
240
241
242
243
244
245
246
247
248
249
1.  Load the R module with the command `module load R`
    -  Note that each version of R uses its own user libraries. To
       install packages under a specific version of R, specify which
       version by using the module load command followed by the version
       number. For example, to load R version 3.5, you would use the
       command `module load R/3.5`
2.  Run R interactively using the command `R`
3.  From within R, use the `install.packages()` command to install
    desired packages. For example, to install the package `ggplot2`
    use the command `install.packages("ggplot2")`
250
251
252
253
254
255
256
257

Some R packages, require external compilers or additional libraries. If
you see an error when installing your package you might need to load
additional modules to make these compilers or libraries available. For
more information about this, refer to the package documentation.

#### Installing packages using R CMD INSTALL

Adam Caprez's avatar
Adam Caprez committed
258
To install packages using `R CMD INSTALL` the zipped package must
259
already be downloaded to the cluster. You can download package source
Adam Caprez's avatar
Adam Caprez committed
260
using `wget`. Then the `R CMD INSTALL` command can be used when
261
262
263
pointed to the full path of the source tar file. For example, to install
ggplot2 the following commands are used:

Adam Caprez's avatar
Adam Caprez committed
264
{{< highlight bash >}}
265
266
267
268
269
# Download the package source:
wget https://cran.r-project.org/src/contrib/ggplot2_2.2.1.tar.gz

# Install the package:
R CMD INSTALL ./ggplot2_2.2.1.tar.gz
Adam Caprez's avatar
Adam Caprez committed
270
{{< /highlight >}}
271

Adam Caprez's avatar
Adam Caprez committed
272
273
Additional information on using the `R CMD INSTALL` command can be
found in the R documentation which can be seen by typing `?INSTALL`
274
within the R console.