-
Adam Caprez authoredAdam Caprez authored
title = "Condor Jobs on HCC"
description = "How to run jobs using Condor on HCC machines"
weight = "54"
This quick start demonstrates how to run multiple copies of Fortran/C program using Condor on HCC supercomputers. The sample codes and submit scripts can be downloaded from condor_dir.zip.
Login to a HCC Cluster
Log in to a HCC cluster through PuTTY (For Windows Users) or Terminal (For Mac/Linux Users) and make a subdirectory called condor_dir
under the $WORK
directory. In the subdirectory condor_dir
, create job subdirectories that host the input data files. Here we create two job subdirectories, job_0
and job_1
, and put a data file (data.dat
) in each subdirectory. The data file in job_0
has a column of data listing the integers from 1 to 5. The data file in job_1
has a integer list from 6 to 10.
{{< highlight bash >}} $ cd $WORK $ mkdir condor_dir $ cd condor_dir $ mkdir job_0 $ mkdir job_1 {{< /highlight >}}
In the subdirectory condor_dir
, save all the relevant codes. Here we
include two demo programs, demo_f_condor.f90
and demo_c_condor.c
,
that compute the sum of the data stored in each job subdirectory
(job_0
and job_1
). The parallelization scheme here is as the
following. First, the master computer node send out many copies of the
executable from the condor_dir
subdirectory and a copy of the data
file in each job subdirectories. The number of executable copies is
specified in the submit script (queue
), and it usually matches with
the number of job subdirectories. Next, the workload is distributed
among a pool of worker computer nodes. At any given time, the number of
available worker nodes may vary. Each worker node executes the jobs
independent of other worker nodes. The output files are separately
stored in the job subdirectory. No additional coding are needed to make
the serial code turned "parallel". Parallelization here is achieved
through the submit script.
{{%expand "demo_condor.f90" %}} {{< highlight fortran >}} Program demo_f_condor implicit none integer, parameter :: N = 5 real8 w integer i common/sol/ x real8 x real8, dimension(N) :: y_local real8, dimension(N) :: input_data
open(10, file='data.dat')
do i = 1,N
read(10,*) input_data(i)
enddo
do i = 1,N
w = input_data(i)*1d0
call proc(w)
y_local(i) = x
write(6,*) 'i,x = ', i, y_local(i)
enddo
write(6,*) 'sum(y) =',sum(y_local)
Stop End Program Subroutine proc(w) real8, intent(in) :: w common/sol/ x real8 x
x = w
Return End Subroutine {{< /highlight >}} {{% /expand %}}
{{%expand "demo_c_condor.c" %}} {{< highlight c >}} //demo_c_condor #include <stdio.h>
double proc(double w){
double x;
x = w;
return x;
}
int main(int argc, char* argv[]){ int N=5; double w; int i; double x; double y_local[N]; double sum; double input_data[N]; FILE *fp; fp = fopen("data.dat","r"); for (i = 1; i<= N; i++){ fscanf(fp, "%lf", &input_data[i-1]); }
for (i = 1; i <= N; i++){
w = input_data[i-1]*1e0;
x = proc(w);
y_local[i-1] = x;
printf("i,x= %d %lf\n", i, y_local[i-1]) ;
}
sum = 0e0;
for (i = 1; i<= N; i++){
sum = sum + y_local[i-1];
}
printf("sum(y)= %lf\n", sum);
return 0; } {{< /highlight >}} {{% /expand %}}
Compiling the Code
The compiled executable needs to match the "standard" environment of the
worker node. The easies way is to directly use the compilers installed
on the HCC supercomputer without loading extra modules. The standard
compiler of the HCC supercomputer is GNU Compier Collection. The version
can be looked up by the command lines gcc -v
or gfortran -v
.
{{< highlight bash >}} $ gfortran demo_f_condor.f90 -o demo_f_condor.x $ gcc demo_c_condor.c -o demo_c_condor.x {{< /highlight >}}
Creating a Submit Script
Create a submit script to request 2 jobs (queue). The name of the job
subdirectories is specified in the line initialdir
. The
$(process)
macro assigns integer numbers to the job subdirectory
name job_
. The numbers run form 0
to queue-1
. The name of the input
data file is specified in the line transfer_input_files
.
{{% panel header="submit_f.condor
"%}}
{{< highlight bash >}}
universe = grid
grid_resource = pbs
batch_queue = guest
should_transfer_files = yes
when_to_transfer_output = on_exit
executable = demo_f_condor.x
output = Fortran_(process).out error = Fortran_(process).err
initialdir = job_$(process)
transfer_input_files = data.dat
queue 2
{{< /highlight >}}
{{% /panel %}}
{{% panel header="submit_c.condor
"%}}
{{< highlight bash >}}
universe = grid
grid_resource = pbs
batch_queue = guest
should_transfer_files = yes
when_to_transfer_output = on_exit
executable = demo_c_condor.x
output = C_(process).out error = C_(process).err
initialdir = job_$(process)
transfer_input_files = data.dat
queue 2
{{< /highlight >}}
{{% /panel %}}
Submit the Job
The job can be submitted through the command condor_submit
. The job
status can be monitored by entering condor_q
followed by the
username.
{{< highlight bash >}} $ condor_submit submit_f.condor $ condor_submit submit_c.condor $ condor_q {{< /highlight >}}
Replace <username>
with your HCC username.
Sample Output
In the job subdirectory job_0
, the sum from 1 to 5 is computed and
printed to the .out
file. In the job subdirectory job_1
, the sum
from 6 to 10 is computed and printed to the .out
file.
{{%expand "Fortran_0.out" %}}
{{< highlight batchfile>}}
i,x = 1 1.0000000000000000
i,x = 2 2.0000000000000000
i,x = 3 3.0000000000000000
i,x = 4 4.0000000000000000
i,x = 5 5.0000000000000000
sum(y) = 15.000000000000000
{{< /highlight >}}
{{% /expand %}}
{{%expand "Fortran_1.out" %}}
{{< highlight batchfile>}}
i,x = 1 6.0000000000000000
i,x = 2 7.0000000000000000
i,x = 3 8.0000000000000000
i,x = 4 9.0000000000000000
i,x = 5 10.000000000000000
sum(y) = 40.000000000000000
{{< /highlight >}}
{{% /expand %}}