Skip to content
Snippets Groups Projects
condor_jobs_on_hcc.md 6.43 KiB
Newer Older
Adam Caprez's avatar
Adam Caprez committed
+++
title = "Condor Jobs on HCC"
description = "How to run jobs using Condor on HCC machines"
weight = "54"
+++
Adam Caprez's avatar
Adam Caprez committed
This quick start demonstrates how to run multiple copies of Fortran/C program
using Condor on HCC supercomputers. The sample codes and submit scripts
Adam Caprez's avatar
Adam Caprez committed
can be downloaded from [condor_dir.zip](/attachments/3178558.zip).
Adam Caprez's avatar
Adam Caprez committed
#### Login to a HCC Cluster
Caughlin Bohn's avatar
Caughlin Bohn committed
Log in to a HCC cluster through PuTTY ([For Windows Users]({{< relref "/quickstarts/connecting/for_windows_users">}})) or Terminal ([For Mac/Linux Users]({{< relref "/quickstarts/connecting/for_maclinux_users">}})) and make a subdirectory called `condor_dir` under the `$WORK` directory.  In the subdirectory `condor_dir`, create job  subdirectories that host the input data files. Here we create two job subdirectories, `job_0` and `job_1`, and put a data file (`data.dat`) in each subdirectory. The data file in `job_0` has a column of data listing the integers from 1 to 5. The data file in `job_1` has a integer list from 6 to 10. 
Adam Caprez's avatar
Adam Caprez committed
{{< highlight bash >}}
$ cd $WORK
$ mkdir condor_dir
$ cd condor_dir
$ mkdir job_0
$ mkdir job_1
Adam Caprez's avatar
Adam Caprez committed
{{< /highlight >}}

In the subdirectory condor`_dir`, save all the relevant codes. Here we
include two demo programs, `demo_f_condor.f90` and `demo_c_condor.c`,
that compute the sum of the data stored in each job subdirectory
(`job_0` and `job_1`). The parallelization scheme here is as the
following. First, the master computer node send out many copies of the
executable from the `condor_dir` subdirectory and a copy of the data
file in each job subdirectories. The number of executable copies is
specified in the submit script (`queue`), and it usually matches with
the number of job subdirectories. Next, the workload is distributed
among a pool of worker computer nodes. At any given time, the number of
available worker nodes may vary. Each worker node executes the jobs
independent of other worker nodes. The output files are separately
stored in the job subdirectory. No additional coding are needed to make
the serial code turned "parallel". Parallelization here is achieved
through the submit script. 

Adam Caprez's avatar
Adam Caprez committed
{{%expand "demo_condor.f90" %}}
{{< highlight fortran >}}
Program demo_f_condor
    implicit none
    integer, parameter :: N = 5
    real*8 w
    integer i
    common/sol/ x
    real*8 x
    real*8, dimension(N) :: y_local
    real*8, dimension(N) :: input_data
    
    open(10, file='data.dat')
    
    do i = 1,N
        read(10,*) input_data(i)
    enddo
    
    do i = 1,N
        w = input_data(i)*1d0
        call proc(w)
        y_local(i) = x      
        write(6,*) 'i,x = ', i, y_local(i)
    enddo
    write(6,*) 'sum(y) =',sum(y_local)
Stop
End Program
Subroutine proc(w)
    real*8, intent(in) :: w
    common/sol/ x
    real*8 x
    
    x = w
    
Return
End Subroutine
Adam Caprez's avatar
Adam Caprez committed
{{< /highlight >}}
{{% /expand %}}
Adam Caprez's avatar
Adam Caprez committed
{{%expand "demo_c_condor.c" %}}
{{< highlight c >}}
//demo_c_condor
#include <stdio.h>

double proc(double w){
        double x;       
        x = w;  
        return x;
}

int main(int argc, char* argv[]){
    int N=5;
    double w;
    int i;
    double x;
    double y_local[N];
    double sum; 
    double input_data[N];
    FILE *fp;
    fp = fopen("data.dat","r");
    for (i = 1; i<= N; i++){
    fscanf(fp, "%lf", &input_data[i-1]);
    }
    
    for (i = 1; i <= N; i++){        
        w = input_data[i-1]*1e0;
        x = proc(w);
        y_local[i-1] = x;
        printf("i,x= %d %lf\n", i, y_local[i-1]) ;
    }
    
    sum = 0e0;
    for (i = 1; i<= N; i++){
        sum = sum + y_local[i-1];   
    }
    
    printf("sum(y)= %lf\n", sum);    
return 0;
}
Adam Caprez's avatar
Adam Caprez committed
{{< /highlight >}}
{{% /expand %}}

---
Adam Caprez's avatar
Adam Caprez committed
#### Compiling the Code

The compiled executable needs to match the "standard" environment of the
worker node. The easies way is to directly use the compilers installed
on the HCC supercomputer without loading extra modules. The standard
compiler of the HCC supercomputer is GNU Compier Collection. The version
can be looked up by the command lines `gcc -v` or `gfortran -v`.

Adam Caprez's avatar
Adam Caprez committed

{{< highlight bash >}}
$ gfortran demo_f_condor.f90 -o demo_f_condor.x
$ gcc demo_c_condor.c -o demo_c_condor.x
Adam Caprez's avatar
Adam Caprez committed
{{< /highlight >}}
Adam Caprez's avatar
Adam Caprez committed
#### Creating a Submit Script

Create a submit script to request 2 jobs (queue). The name of the job
subdirectories is specified in the line `initialdir`. The
`$(process)` macro assigns integer numbers to the job subdirectory
Adam Caprez's avatar
Adam Caprez committed
name `job_`. The numbers run form `0` to `queue-1`. The name of the input
data file is specified in the line `transfer_input_files`.

Adam Caprez's avatar
Adam Caprez committed
{{% panel header="`submit_f.condor`"%}}
{{< highlight bash >}}
universe = grid
grid_resource = pbs
batch_queue = guest
should_transfer_files = yes
when_to_transfer_output = on_exit
executable = demo_f_condor.x
output = Fortran_$(process).out
error = Fortran_$(process).err
initialdir = job_$(process)
transfer_input_files = data.dat
queue 2
Adam Caprez's avatar
Adam Caprez committed
{{< /highlight >}}
{{% /panel %}}
Adam Caprez's avatar
Adam Caprez committed
{{% panel header="`submit_c.condor`"%}}
{{< highlight bash >}}
universe = grid
grid_resource = pbs
batch_queue = guest
should_transfer_files = yes
when_to_transfer_output = on_exit
executable = demo_c_condor.x
output = C_$(process).out
error = C_$(process).err
initialdir = job_$(process)
transfer_input_files = data.dat
queue 2
Adam Caprez's avatar
Adam Caprez committed
{{< /highlight >}}
{{% /panel %}}
Adam Caprez's avatar
Adam Caprez committed
#### Submit the Job

The job can be submitted through the command `condor_submit`. The job
status can be monitored by entering `condor_q` followed by the
username. 

Adam Caprez's avatar
Adam Caprez committed
{{< highlight bash >}}
$ condor_submit submit_f.condor
$ condor_submit submit_c.condor
$ condor_q <username>
Adam Caprez's avatar
Adam Caprez committed
{{< /highlight >}}

Replace `<username>` with your HCC username.

Sample Output
-------------

In the job subdirectory `job_0`, the sum from 1 to 5 is computed and
printed to the `.out` file. In the job subdirectory `job_1`, the sum
from 6 to 10 is computed and printed to the `.out` file. 

Adam Caprez's avatar
Adam Caprez committed
{{%expand "Fortran_0.out" %}}
{{< highlight batchfile>}}
 i,x =            1   1.0000000000000000     
 i,x =            2   2.0000000000000000     
 i,x =            3   3.0000000000000000     
 i,x =            4   4.0000000000000000     
 i,x =            5   5.0000000000000000     
 sum(y) =   15.000000000000000     
Adam Caprez's avatar
Adam Caprez committed
{{< /highlight >}}
{{% /expand %}}
Adam Caprez's avatar
Adam Caprez committed
{{%expand "Fortran_1.out" %}}
{{< highlight batchfile>}}
 i,x =            1   6.0000000000000000     
 i,x =            2   7.0000000000000000     
 i,x =            3   8.0000000000000000     
 i,x =            4   9.0000000000000000     
 i,x =            5   10.000000000000000     
 sum(y) =   40.000000000000000     
Adam Caprez's avatar
Adam Caprez committed
{{< /highlight >}}
{{% /expand %}}