examples.rst 2.94 KB
Newer Older
1
2
3
Examples
==========

aknecht2's avatar
aknecht2 committed
4
5
6
7
8
9
10
Whenever generating a workflow, there are three required files.  A config file,
a run file, and a param file.  The config file is used to specify system
information -- paths to required software, environment variables for pegasus
and so on.  The run file is used to specify the actual files to process and
what software tools to use on them.  Finally, the param file is used to
override any default params for the jobs in the workflow.

11
12
13
14
15
**Config**

.. code-block:: yaml

    notify:
aknecht2's avatar
aknecht2 committed
16
17
      pegasus_home: "/usr/share/pegasus/"
      email: "avi@kurtknecht.com"
18
    profile:
aknecht2's avatar
aknecht2 committed
19
20
21
22
23
24
25
26
27
28
      pegasus:
        style: "glite"
      condor:
        grid_resource: "pbs"
        universe: "vanilla"
        batch_queue: "batch"
      env:
        PYTHONPATH: "/home/swanson/aknecht/.conda/envs/ih_env/lib/python2.7/site-packages/"
        PATH: "/home/swanson/aknecht/.conda/envs/ih_env/bin:/bin/:/usr/bin/:/usr/local/bin/"
        PEGASUS_HOME: "/usr/"
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

**Run**

.. code-block:: yaml

    genomes:
      mm9:
        bowtie2: /work/ladunga/SHARED/mouse/mm9/mm9.genome.fa
        bwa: /work/ladunga/SHARED/mouse/mm9/mm9.genome.fa
        chrom.sizes: /work/ladunga/SHARED/mouse/mm9/mm9.chrom.sizes
    runs:
    - align: bwa
      assembly: mm9
      controls: &id001
      - ENCFF001NIM
      file_type: fastq
      idr: &id002
      - ENCFF001NIP
      - ENCFF001NIS
      peak: spp
      peak_type: narrow
      signals: &id003
      - ENCFF001NIP
      - ENCFF001NIS
    - align: bowtie2
      assembly: mm9
      controls: *id001
      file_type: fastq
      idr: *id002
      peak: spp
      peak_type: narrow
      signals: *id003
aknecht2's avatar
aknecht2 committed
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117

**Param**

.. code-block:: yaml

    macs2_callpeak:
      arguments:
        "-g": "mm"
    bwa_align_single:
      arguments:
        "-q": 5
        "-l": 32
        "-k": 2
        "-t": 1
    bwa_align_paired:
      arguments:
        "-t": 1
    samtools_sam_to_bam:
        walltime: 60
        memory: 16000

To generate the workflow, pass these input files into the :ref:`chip-gen`
script, like so:

.. code-block:: bash

    chip-gen \
      --dir DIRECTORY_NAME \
      --host DB_HOST \
      --username USERNAME \
      --password PASSWORD \
      --param param.yaml \
      --conf config.yaml \
      --run run.yaml

This will generate all files necessary to run the workflow in the specified
directory under a date-time stamped folder.  The structure will look like this:

.. code-block:: bash

    directory_name/
        date-timestamp/
          input/
            chipathlon.dax
            conf.rc
            db_meta/
            notify.sh
            sites.xml
            submit.sh
          output/
          work/

From here, you can use the submit.sh script to actually submit the workflow!
submit.sh creates status.sh & remove.sh, which are scripts used to check the
status of the workflow and remove the workflow respectively.  Upon completion
of the workflow the notify.sh script is used to email the address specified
in your configuration.