diff --git a/doc/source/examples.rst b/doc/source/examples.rst index 6ca1d13809cddb3e9bfd1b932844abc0588c20ea..a46a89699af9de708fe685e9b575ead8b0651759 100644 --- a/doc/source/examples.rst +++ b/doc/source/examples.rst @@ -1,14 +1,51 @@ Examples ========== -Whenever generating a workflow, there are three required files. A config file, -a run file, and a param file. The config file is used to specify system -information -- paths to required software, environment variables for pegasus -and so on. The run file is used to specify the actual files to process and -what software tools to use on them. Finally, the param file is used to -override any default params for the jobs in the workflow. In each of the -examples below, all three of these files will be talked about, and download -links to each will be provided. +Whenever generating a workflow, there are five total required files you +will need to create: + +* **Config File** + A few pieces of info need to be defined in here, specifically the bin path + to the chipathlon environment, the bin path to the idr environment, and + the email address to message when the workflow is complete. +* **Param File** + Allows the user to overwrite options for many of the software tools being + used. Most numeric arguments have defaults that can be changed by the + end-user. +* **Run File** + Describes the actually files to process and what alignment / peak calling + tools should be used on them, and whether or not to run idr. +* **Properties File** + One of the required files by pegasus. For more information see their + `properties documentation <https://pegasus.isi.edu/documentation/properties.php>`_ +* **Sites File** + One of the required files by pegasus. For more information see their + `sites catalog documentation <https://pegasus.isi.edu/documentation/site.php>`_ + +The information located in the properties file will be highly specific to +the environment that you're submitting on. Additionally, genomic information +is expected to be downloaded & built for the target genome you're interested +in, as well as a chromsome sizes files. + +Supported Tools +^^^^^^^^^^^^^^^^ + +Alignment: + +* `bwa <http://bio-bwa.sourceforge.net>`_ +* `bowtie2 <http://bowtie-bio.sourceforge.net/bowtie2/index.shtml>`_ + +Peak Calling: + +* `spp <https://github.com/hms-dbmi/spp>`_ (narrow, broad) +* `zerone <https://omictools.com/zerone-tool>`_ (broad) +* `macs2 <https://github.com/taoliu/MACS>`_ (narrow, broad) +* `gem <http://groups.csail.mit.edu/cgs/gem/>`_ (narrow) +* `peakranger <http://ranger.sourceforge.net/manual1.18.html>`_ (narrow) +* `ccat <http://ranger.sourceforge.net/manual1.18.html>`_ (broad) +* `music <https://github.com/gersteinlab/MUSIC>`_ (narrow, punctate, broad) +* `pepr <https://github.com/shawnzhangyx/PePr>`_ (narrow) +* `hiddendomains <http://hiddendomains.sourceforge.net/>`_ (broad) Getting Started ^^^^^^^^^^^^^^^^ @@ -18,34 +55,28 @@ Getting Started :download:`Param <examples/small_test_param.yaml>` +:download:`Properties <examples/small_test_properties.txt>` + +:download:`Sites <examples/small_test_sites.xml>` + **Config** -.. code-block:: yaml +.. code-block:: text + + chipathlon_bin: /home/swanson/aknecht/.conda/envs/chip/bin + idr_bin: /home/swanson/aknecht/.conda/envs/idr/bin + pegasus_home: /usr/share/pegasus/ + email: YOUREMAIL@DOMAIN.com - notify: - pegasus_home: "/usr/share/pegasus/" - email: "avi@kurtknecht.com" - profile: - pegasus: - style: "glite" - condor: - grid_resource: "pbs" - universe: "vanilla" - batch_queue: "batch" - env: - PYTHONPATH: "/home/swanson/aknecht/.conda/envs/ih_env/lib/python2.7/site-packages/" - PATH: "/home/swanson/aknecht/.conda/envs/ih_env/bin:/bin/:/usr/bin/:/usr/local/bin/" - PEGASUS_HOME: "/usr/" - -Specifying an email in the config file will send an email to the target -address once the workflow is complete. The pegasus_home definition corresponds -to the pegasus install location. This is necessary so the pegasus email -script (in pegasus/notification/email) can be found and executed successfully. -The config file profile information is passed through to the pegasus -`sites catalog <https://pegasus.isi.edu/documentation/site.php>`_. This allows -any pegasus `profile <https://pegasus.isi.edu/documentation/profiles.php>`_ -information to be passed. The required information will be dependent on the -system you are submitting to. +The top two lines define the bin paths to the chipathlon and idr environments. +The paths will depend on where you created your environments, but if you +followed the installation instructions they will be in your home directory in +the .conda folder. These two paths are required to find all the necessary +software to execute. Specifying an email in the config file will send an email +to the target address once the workflow is complete. The pegasus_home +definition corresponds to the pegasus install location. This is necessary so +the pegasus email script (in pegasus/notification/email) can be found and +executed successfully. **Run** @@ -97,8 +128,8 @@ necessary for processing: Defines the type of files that processing initial begins with. Should be either fastq or bam. * peak - The tool used for peak calling. Should be one of [spp, gem, macs2, - peakranger, ccat, zerone, music]. + The tool used for peak calling. Above in the supported tools section there + is a list defining all peak calling tools, and their supporting peak types. * peak_type The type of peak calling to perform. The peak type is tool dependent, as tools support different peak calling types. Usually peak_type is narrow @@ -115,8 +146,10 @@ When creating runs, often times you'll want to investigate the same files with multiple different peak calling and alignment tools. In the case above, the two runs defined are identical except for the alignment tool -- one uses bwa and the other uses bowite2. To avoid retyping a lot of information, lists -can be marked with ids using the & symbol. Later on in the file, the list can -be referenced using the * symbol. +can be marked with ids using the & symbol and a unique identifier. Later on in +the file, the list can be referenced using the * symbol. Since we are only +changing the alignment tool there's no need to type out all the samples a +second time. **Param** @@ -125,6 +158,10 @@ be referenced using the * symbol. macs2_callpeak: arguments: "-g": "mm" + bwa_align_single: + arguments: + "-l": 20 + "-q": 6 music_punctate: arguments: "--mapp": "/work/ladunga/SHARED/workflows/mm9_50bp" @@ -144,6 +181,46 @@ specify the "-g": "mm" for macs2 peak calling. The music peak caller requires additional information to run successfully (even though we are not using it). Finally, we specify not to remove duplicates. +**Properties** + +.. code-block:: text + + pegasus.catalog.site = XML + pegasus.catalog.site.file = small_test_sites.xml + + pegasus.condor.logs.symlink = false + pegasus.transfer.links = true + pegasus.data.configuration = sharedfs + +Again, for more information on the properties file consult the pegasus +`properties documentation <https://pegasus.isi.edu/documentation/properties.php>`_ + +**Sites** + +.. code-block:: xml + + <?xml version="1.0" ?> + <sitecatalog version="4.0" xmlns="http://pegasus.isi.edu/schema/sitecatalog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd"> + <site arch="x86_64" handle="local" os="LINUX"> + <directory path="/lustre/work/ladunga/SHARED/workflows/new_tests/full_test/work" type="shared-scratch"> + <file-server operation="all" url="file:///lustre/work/ladunga/SHARED/workflows/new_tests/full_test/work"/> + </directory> + <directory path="/lustre/work/ladunga/SHARED/workflows/new_tests/full_test/output" type="local-storage"> + <file-server operation="all" url="file:///lustre/work/ladunga/SHARED/workflows/new_tests/full_test/output"/> + </directory> + + <profile key="change.dir" namespace="pegasus">true</profile> + <profile key="transfer.threads" namespace="pegasus">4</profile> + <profile key="universe" namespace="condor">vanilla</profile> + <profile key="grid_resource" namespace="condor">pbs</profile> + <profile key="batch_queue" namespace="condor">batch</profile> + <profile key="style" namespace="pegasus">glite</profile> + </site> + </sitecatalog> + +Again, for more information on the sites file consult the pegasus +`sites catalog documentation <https://pegasus.isi.edu/documentation/site.php>`_ + **Generation** To generate the workflow, pass these input files into the :ref:`chip-gen` @@ -154,11 +231,12 @@ script, like so: chip-gen \ --dir DIRECTORY_NAME \ --host DB_HOST \ - --username USERNAME \ - --password PASSWORD \ --param param.yaml \ --conf config.yaml \ - --run run.yaml + --run run.yaml \ + --properties properties.txt \ + --execute-site local \ + --output-site local This will generate all files necessary to run the workflow in the specified directory under a date-time stamped folder. The structure will look like this: @@ -169,10 +247,8 @@ directory under a date-time stamped folder. The structure will look like this: date-timestamp/ input/ chipathlon.dax - conf.rc db_meta/ notify.sh - sites.xml submit.sh output/ work/ diff --git a/doc/source/examples/small_test_config.yaml b/doc/source/examples/small_test_config.yaml index 55eb7f7baed9426a33ac469a831af4f11b4b3411..75e486743e25c0f7ae17ffe7cb3e6726f05f14b9 100644 --- a/doc/source/examples/small_test_config.yaml +++ b/doc/source/examples/small_test_config.yaml @@ -1,14 +1,4 @@ -notify: - pegasus_home: "/usr/share/pegasus/" - email: "avi@kurtknecht.com" -profile: - pegasus: - style: "glite" - condor: - grid_resource: "pbs" - universe: "vanilla" - batch_queue: "batch" - env: - PYTHONPATH: "/home/swanson/aknecht/.conda/envs/ih_env/lib/python2.7/site-packages/" - PATH: "/home/swanson/aknecht/.conda/envs/ih_env/bin:/bin/:/usr/bin/:/usr/local/bin/" - PEGASUS_HOME: "/usr/" +chipathlon_bin: /home/swanson/aknecht/.conda/envs/chip/bin +idr_bin: /home/swanson/aknecht/.conda/envs/idr/bin +pegasus_home: /usr/share/pegasus/ +email: YOUREMAIL@DOMAIN.com diff --git a/doc/source/examples/small_test_param.yaml b/doc/source/examples/small_test_param.yaml index 8264fa5cc7d8a43425f1053a238434aeae7f0dcc..fee17c972cc699d27d4d200472f4c344226c68c9 100644 --- a/doc/source/examples/small_test_param.yaml +++ b/doc/source/examples/small_test_param.yaml @@ -1,6 +1,10 @@ macs2_callpeak: arguments: "-g": "mm" +bwa_align_single: + arguments: + "-l": 20 + "-q": 6 music_punctate: arguments: "--mapp": "/work/ladunga/SHARED/workflows/mm9_50bp" diff --git a/doc/source/examples/small_test_properties.txt b/doc/source/examples/small_test_properties.txt new file mode 100644 index 0000000000000000000000000000000000000000..2bb50e0f5e84d4a00966312ab5a9a870677c065b --- /dev/null +++ b/doc/source/examples/small_test_properties.txt @@ -0,0 +1,6 @@ +pegasus.catalog.site = XML +pegasus.catalog.site.file = small_test_sites.xml + +pegasus.condor.logs.symlink = false +pegasus.transfer.links = true +pegasus.data.configuration = sharedfs diff --git a/doc/source/examples/small_test_sites.xml b/doc/source/examples/small_test_sites.xml new file mode 100644 index 0000000000000000000000000000000000000000..000d22a5dd33961318e3ecb0a39650ff234080b9 --- /dev/null +++ b/doc/source/examples/small_test_sites.xml @@ -0,0 +1,18 @@ +<?xml version="1.0" ?> +<sitecatalog version="4.0" xmlns="http://pegasus.isi.edu/schema/sitecatalog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd"> + <site arch="x86_64" handle="local" os="LINUX"> + <directory path="/lustre/work/ladunga/SHARED/workflows/new_tests/full_test/work" type="shared-scratch"> + <file-server operation="all" url="file:///lustre/work/ladunga/SHARED/workflows/new_tests/full_test/work"/> + </directory> + <directory path="/lustre/work/ladunga/SHARED/workflows/new_tests/full_test/output" type="local-storage"> + <file-server operation="all" url="file:///lustre/work/ladunga/SHARED/workflows/new_tests/full_test/output"/> + </directory> + + <profile key="change.dir" namespace="pegasus">true</profile> + <profile key="transfer.threads" namespace="pegasus">4</profile> + <profile key="universe" namespace="condor">vanilla</profile> + <profile key="grid_resource" namespace="condor">pbs</profile> + <profile key="batch_queue" namespace="condor">batch</profile> + <profile key="style" namespace="pegasus">glite</profile> + </site> +</sitecatalog>