+++ title = "Condor Jobs on HCC" description = "How to run jobs using Condor on HCC machines" weight = "54" +++ This quick start demonstrates how to run multiple copies of Fortran/C program using Condor on HCC supercomputers. The sample codes and submit scripts can be downloaded from [condor_dir.zip](/attachments/3178558.zip). #### Login to a HCC Cluster Log in to a HCC cluster through PuTTY ([For Windows Users]({{< relref "/quickstarts/for_windows_users">}})) or Terminal ([For Mac/Linux Users]({{< relref "/quickstarts/for_maclinux_users">}})) and make a subdirectory called `condor_dir` under the `$WORK` directory. In the subdirectory `condor_dir`, create job subdirectories that host the input data files. Here we create two job subdirectories, `job_0` and `job_1`, and put a data file (`data.dat`) in each subdirectory. The data file in `job_0` has a column of data listing the integers from 1 to 5. The data file in `job_1` has a integer list from 6 to 10. {{< highlight bash >}} $ cd $WORK $ mkdir condor_dir $ cd condor_dir $ mkdir job_0 $ mkdir job_1 {{< /highlight >}} In the subdirectory condor`_dir`, save all the relevant codes. Here we include two demo programs, `demo_f_condor.f90` and `demo_c_condor.c`, that compute the sum of the data stored in each job subdirectory (`job_0` and `job_1`). The parallelization scheme here is as the following. First, the master computer node send out many copies of the executable from the `condor_dir` subdirectory and a copy of the data file in each job subdirectories. The number of executable copies is specified in the submit script (`queue`), and it usually matches with the number of job subdirectories. Next, the workload is distributed among a pool of worker computer nodes. At any given time, the number of available worker nodes may vary. Each worker node executes the jobs independent of other worker nodes. The output files are separately stored in the job subdirectory. No additional coding are needed to make the serial code turned "parallel". Parallelization here is achieved through the submit script. {{%expand "demo_condor.f90" %}} {{< highlight fortran >}} Program demo_f_condor implicit none integer, parameter :: N = 5 real*8 w integer i common/sol/ x real*8 x real*8, dimension(N) :: y_local real*8, dimension(N) :: input_data open(10, file='data.dat') do i = 1,N read(10,*) input_data(i) enddo do i = 1,N w = input_data(i)*1d0 call proc(w) y_local(i) = x write(6,*) 'i,x = ', i, y_local(i) enddo write(6,*) 'sum(y) =',sum(y_local) Stop End Program Subroutine proc(w) real*8, intent(in) :: w common/sol/ x real*8 x x = w Return End Subroutine {{< /highlight >}} {{% /expand %}} {{%expand "demo_c_condor.c" %}} {{< highlight c >}} //demo_c_condor #include <stdio.h> double proc(double w){ double x; x = w; return x; } int main(int argc, char* argv[]){ int N=5; double w; int i; double x; double y_local[N]; double sum; double input_data[N]; FILE *fp; fp = fopen("data.dat","r"); for (i = 1; i<= N; i++){ fscanf(fp, "%lf", &input_data[i-1]); } for (i = 1; i <= N; i++){ w = input_data[i-1]*1e0; x = proc(w); y_local[i-1] = x; printf("i,x= %d %lf\n", i, y_local[i-1]) ; } sum = 0e0; for (i = 1; i<= N; i++){ sum = sum + y_local[i-1]; } printf("sum(y)= %lf\n", sum); return 0; } {{< /highlight >}} {{% /expand %}} --- #### Compiling the Code The compiled executable needs to match the "standard" environment of the worker node. The easies way is to directly use the compilers installed on the HCC supercomputer without loading extra modules. The standard compiler of the HCC supercomputer is GNU Compier Collection. The version can be looked up by the command lines `gcc -v` or `gfortran -v`. {{< highlight bash >}} $ gfortran demo_f_condor.f90 -o demo_f_condor.x $ gcc demo_c_condor.c -o demo_c_condor.x {{< /highlight >}} #### Creating a Submit Script Create a submit script to request 2 jobs (queue). The name of the job subdirectories is specified in the line `initialdir`. The `$(process)` macro assigns integer numbers to the job subdirectory name `job_`. The numbers run form `0` to `queue-1`. The name of the input data file is specified in the line `transfer_input_files`. {{% panel header="`submit_f.condor`"%}} {{< highlight bash >}} universe = grid grid_resource = pbs batch_queue = guest should_transfer_files = yes when_to_transfer_output = on_exit executable = demo_f_condor.x output = Fortran_$(process).out error = Fortran_$(process).err initialdir = job_$(process) transfer_input_files = data.dat queue 2 {{< /highlight >}} {{% /panel %}} {{% panel header="`submit_c.condor`"%}} {{< highlight bash >}} universe = grid grid_resource = pbs batch_queue = guest should_transfer_files = yes when_to_transfer_output = on_exit executable = demo_c_condor.x output = C_$(process).out error = C_$(process).err initialdir = job_$(process) transfer_input_files = data.dat queue 2 {{< /highlight >}} {{% /panel %}} #### Submit the Job The job can be submitted through the command `condor_submit`. The job status can be monitored by entering `condor_q` followed by the username. {{< highlight bash >}} $ condor_submit submit_f.condor $ condor_submit submit_c.condor $ condor_q <username> {{< /highlight >}} Replace `<username>` with your HCC username. Sample Output ------------- In the job subdirectory `job_0`, the sum from 1 to 5 is computed and printed to the `.out` file. In the job subdirectory `job_1`, the sum from 6 to 10 is computed and printed to the `.out` file. {{%expand "Fortran_0.out" %}} {{< highlight batchfile>}} i,x = 1 1.0000000000000000 i,x = 2 2.0000000000000000 i,x = 3 3.0000000000000000 i,x = 4 4.0000000000000000 i,x = 5 5.0000000000000000 sum(y) = 15.000000000000000 {{< /highlight >}} {{% /expand %}} {{%expand "Fortran_1.out" %}} {{< highlight batchfile>}} i,x = 1 6.0000000000000000 i,x = 2 7.0000000000000000 i,x = 3 8.0000000000000000 i,x = 4 9.0000000000000000 i,x = 5 10.000000000000000 sum(y) = 40.000000000000000 {{< /highlight >}} {{% /expand %}}