Newer
Older
+++
title = "Condor Jobs on HCC"
description = "How to run jobs using Condor on HCC machines"
weight = "54"
+++
This quick start demonstrates how to run multiple copies of Fortran/C program
using Condor on HCC supercomputers. The sample codes and submit scripts
can be downloaded from [condor_dir.zip](/attachments/3178558.zip).
Log in to a HCC cluster through PuTTY ([For Windows Users]({{< relref "/quickstarts/connecting/for_windows_users">}})) or Terminal ([For Mac/Linux Users]({{< relref "/quickstarts/connecting/for_maclinux_users">}})) and make a subdirectory called `condor_dir` under the `$WORK` directory. In the subdirectory `condor_dir`, create job subdirectories that host the input data files. Here we create two job subdirectories, `job_0` and `job_1`, and put a data file (`data.dat`) in each subdirectory. The data file in `job_0` has a column of data listing the integers from 1 to 5. The data file in `job_1` has a integer list from 6 to 10.
$ cd $WORK
$ mkdir condor_dir
$ cd condor_dir
$ mkdir job_0
$ mkdir job_1
In the subdirectory condor`_dir`, save all the relevant codes. Here we
include two demo programs, `demo_f_condor.f90` and `demo_c_condor.c`,
that compute the sum of the data stored in each job subdirectory
(`job_0` and `job_1`). The parallelization scheme here is as the
following. First, the master computer node send out many copies of the
executable from the `condor_dir` subdirectory and a copy of the data
file in each job subdirectories. The number of executable copies is
specified in the submit script (`queue`), and it usually matches with
the number of job subdirectories. Next, the workload is distributed
among a pool of worker computer nodes. At any given time, the number of
available worker nodes may vary. Each worker node executes the jobs
independent of other worker nodes. The output files are separately
stored in the job subdirectory. No additional coding are needed to make
the serial code turned "parallel". Parallelization here is achieved
through the submit script.
{{%expand "demo_condor.f90" %}}
{{< highlight fortran >}}
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
Program demo_f_condor
implicit none
integer, parameter :: N = 5
real*8 w
integer i
common/sol/ x
real*8 x
real*8, dimension(N) :: y_local
real*8, dimension(N) :: input_data
open(10, file='data.dat')
do i = 1,N
read(10,*) input_data(i)
enddo
do i = 1,N
w = input_data(i)*1d0
call proc(w)
y_local(i) = x
write(6,*) 'i,x = ', i, y_local(i)
enddo
write(6,*) 'sum(y) =',sum(y_local)
Stop
End Program
Subroutine proc(w)
real*8, intent(in) :: w
common/sol/ x
real*8 x
x = w
Return
End Subroutine
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
//demo_c_condor
#include <stdio.h>
double proc(double w){
double x;
x = w;
return x;
}
int main(int argc, char* argv[]){
int N=5;
double w;
int i;
double x;
double y_local[N];
double sum;
double input_data[N];
FILE *fp;
fp = fopen("data.dat","r");
for (i = 1; i<= N; i++){
fscanf(fp, "%lf", &input_data[i-1]);
}
for (i = 1; i <= N; i++){
w = input_data[i-1]*1e0;
x = proc(w);
y_local[i-1] = x;
printf("i,x= %d %lf\n", i, y_local[i-1]) ;
}
sum = 0e0;
for (i = 1; i<= N; i++){
sum = sum + y_local[i-1];
}
printf("sum(y)= %lf\n", sum);
return 0;
}
The compiled executable needs to match the "standard" environment of the
worker node. The easies way is to directly use the compilers installed
on the HCC supercomputer without loading extra modules. The standard
compiler of the HCC supercomputer is GNU Compier Collection. The version
can be looked up by the command lines `gcc -v` or `gfortran -v`.
$ gfortran demo_f_condor.f90 -o demo_f_condor.x
$ gcc demo_c_condor.c -o demo_c_condor.x
Create a submit script to request 2 jobs (queue). The name of the job
subdirectories is specified in the line `initialdir`. The
`$(process)` macro assigns integer numbers to the job subdirectory
name `job_`. The numbers run form `0` to `queue-1`. The name of the input
data file is specified in the line `transfer_input_files`.
{{% panel header="`submit_f.condor`"%}}
{{< highlight bash >}}
universe = grid
grid_resource = pbs
batch_queue = guest
should_transfer_files = yes
when_to_transfer_output = on_exit
executable = demo_f_condor.x
output = Fortran_$(process).out
error = Fortran_$(process).err
initialdir = job_$(process)
transfer_input_files = data.dat
queue 2
{{% panel header="`submit_c.condor`"%}}
{{< highlight bash >}}
universe = grid
grid_resource = pbs
batch_queue = guest
should_transfer_files = yes
when_to_transfer_output = on_exit
executable = demo_c_condor.x
output = C_$(process).out
error = C_$(process).err
initialdir = job_$(process)
transfer_input_files = data.dat
queue 2
The job can be submitted through the command `condor_submit`. The job
status can be monitored by entering `condor_q` followed by the
username.
$ condor_submit submit_f.condor
$ condor_submit submit_c.condor
$ condor_q <username>
{{< /highlight >}}
Replace `<username>` with your HCC username.
Sample Output
-------------
In the job subdirectory `job_0`, the sum from 1 to 5 is computed and
printed to the `.out` file. In the job subdirectory `job_1`, the sum
from 6 to 10 is computed and printed to the `.out` file.
{{%expand "Fortran_0.out" %}}
{{< highlight batchfile>}}
i,x = 1 1.0000000000000000
i,x = 2 2.0000000000000000
i,x = 3 3.0000000000000000
i,x = 4 4.0000000000000000
i,x = 5 5.0000000000000000
sum(y) = 15.000000000000000
{{%expand "Fortran_1.out" %}}
{{< highlight batchfile>}}
i,x = 1 6.0000000000000000
i,x = 2 7.0000000000000000
i,x = 3 8.0000000000000000
i,x = 4 9.0000000000000000
i,x = 5 10.000000000000000
sum(y) = 40.000000000000000