Commit 6fc166b6 authored by aknecht2's avatar aknecht2
Browse files

Added bridges example. Updated chip-gen doc.

parent 4734c43e
Examples
==========
Whenever generating a workflow, there are five total required files you
Whenever generating a workflow, there are four total required files you
will need to create:
* **Config File**
A few pieces of info need to be defined in here, specifically the bin path
to the chipathlon environment, the bin path to the idr environment, and
the email address to message when the workflow is complete.
* **Param File**
Allows the user to overwrite options for many of the software tools being
used. Most numeric arguments have defaults that can be changed by the
......@@ -49,8 +45,6 @@ Peak Calling:
Getting Started
^^^^^^^^^^^^^^^^
:download:`Config <examples/small_test_config.yaml>`
:download:`Run <examples/small_test_run.yaml>`
:download:`Param <examples/small_test_param.yaml>`
......@@ -59,25 +53,6 @@ Getting Started
:download:`Sites <examples/small_test_sites.xml>`
**Config**
.. code-block:: text
chipathlon_bin: /home/swanson/aknecht/.conda/envs/chip/bin
idr_bin: /home/swanson/aknecht/.conda/envs/idr/bin
pegasus_home: /usr/share/pegasus/
email: YOUREMAIL@DOMAIN.com
The top two lines define the bin paths to the chipathlon and idr environments.
The paths will depend on where you created your environments, but if you
followed the installation instructions they will be in your home directory in
the .conda folder. These two paths are required to find all the necessary
software to execute. Specifying an email in the config file will send an email
to the target address once the workflow is complete. The pegasus_home
definition corresponds to the pegasus install location. This is necessary so
the pegasus email script (in pegasus/notification/email) can be found and
executed successfully.
**Run**
.. code-block:: yaml
......@@ -232,11 +207,30 @@ script, like so:
--dir DIRECTORY_NAME \
--host DB_HOST \
--param param.yaml \
--conf config.yaml \
--run run.yaml \
--properties properties.txt \
--execute-site local \
--output-site local
--output-site local \
--chip-bin path/to/chipathlon/env/bin \
--idr-bin path/to/idr/env/bin \
--email YOUREMAIL@gmail.com
The :ref:`chip-gen` script has many options.
* --dir specifies the sub-directory to generate the workflow information under.
If this directory does not exist it will be created.
* --host specifies the address of the mongodb database host.
* --param should be input param file to overwrite job arguments.
* --run is the input run file that describes the actual software tools being used.
* --properties is the pegasus properties file. This file should reference the
sites file so sites doesn't need to be included explicitly.
* --execute-site the site that jobs will be run on. This site should match
the name of one of the sites in the sites.xml file exactly.
* --output-site the site that result files will be transferred to. This site
should match the name of one of the sites in the sites.xml file exactly.
* --chip-bin the path to the chipathlon conda environment bin.
* --idr-bin the path to the idr conda environment bin.
* --email the email address to notify once the workflow is complete
This will generate all files necessary to run the workflow in the specified
directory under a date-time stamped folder. The structure will look like this:
......@@ -258,3 +252,88 @@ submit.sh creates status.sh & remove.sh, which are scripts used to check the
status of the workflow and remove the workflow respectively. Upon completion
of the workflow the notify.sh script is used to email the address specified
in your configuration.
Bridges
^^^^^^^^
:download:`Run <examples/bridges_run.yaml>`
:download:`Param <examples/bridges_param.yaml>`
:download:`Properties <examples/bridges_properties.txt>`
:download:`Sites <examples/bridges_sites.xml>`
Depending on where you submit jobs to your properties and sites files may
need to change. In this case we show how to submit to Bridges, one of
the `Xsede <https://www.xsede.org/>`_ resources. You will need an allocation
before being able to use this example.
There are a few key differences in the configuration for bridges, most notably
the difference between execute-site and output-site. Previously, when
submitting to a local cluster we didn't need to define multiple sites.
However, we will be submitting to a remote cluster and transfer files back over
ssh. We still expect genomic files to be on the local site as they will be
transferred to the remote site, however we expect the conda environments to
be set up on the remote site. This means when passing in the chip-bin and
idr-bin arguments we expect that these bins are folders on the execute-site.
**Properties**
.. code-block:: text
pegasus.catalog.site = XML
pegasus.catalog.site.file = bridges_sites.xml
pegasus.dir.useTimestamp = true
pegasus.condor.logs.symlink = false
pegasus.data.configuration = sharedfs
pegasus.transfer.links = true
pegasus.transfer.worker.package = True
pegasus.condor.arguments.quote = False
**Sites**
.. code-block:: xml
<?xml version="1.0" ?>
<sitecatalog version="4.0" xmlns="http://pegasus.isi.edu/schema/sitecatalog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd">
<site arch="x86_64" handle="local" os="LINUX">
<directory path="/some/local/path/work" type="shared-scratch">
<file-server operation="all" url="file:///some/local/path/work"/>
</directory>
<directory path="/some/local/path/output" type="local-storage">
<file-server operation="all" url="file:///some/local/path/output"/>
</directory>
<profile key="SSH_PRIVATE_KEY" namespace="env">/home/USERNAME/.ssh/bosco_key_noenc.rsa</profile>
</site>
<site arch="x86_64" handle="PSC_Bosco" os="LINUX">
<directory path="/pylon5/GROUP/USERNAME/bridges_test/2017-05-18_163011/work" type="shared-scratch">
<file-server operation="all" url="file:///pylon5/GROUP/USERNAME/bridges_test/2017-05-18_163011/work"/>
</directory>
<directory path="/pylon5/GROUP/USERNAME/bridges_test/2017-05-18_163011/output" type="shared-storage">
<file-server operation="all" url="file:///pylon5/GROUP/USERNAME/bridges_test/2017-05-18_163011/output"/>
</directory>
<grid contact="name@bridges.psc.edu" jobtype="compute" scheduler="slurm" type="batch"/>
<grid contact="name@bridges.psc.edu" jobtype="auxillary" scheduler="slurm" type="batch"/>
<profile key="queue" namespace="globus">RM-shared</profile>
<profile key="change.dir" namespace="pegasus">True</profile>
<profile key="style" namespace="pegasus">ssh</profile>
<profile key="glite.arguments" namespace="pegasus">-C EGRESS</profile>
</site>
</sitecatalog>
.. code-block:: bash
chip-gen \
--dir DIRECTORY_NAME \
--host DB_HOST \
--param param.yaml \
--run run.yaml \
--properties properties.txt \
--execute-site PSC_Bosco \
--output-site local \
--chip-bin path/to/chipathlon/env/bin \
--idr-bin path/to/idr/env/bin \
--email YOUREMAIL@gmail.com
macs2_callpeak:
arguments:
"-g": "mm"
bwa_align_single:
walltime: 1440
memory: 16000
bowtie2_align_single:
walltime: 1440
memory: 16000
db_save_result:
memory: 4000
walltime: 120
bwa_sai_to_sam:
memory: 16000
cores: 4
walltime: 1440
download_from_encode:
walltime: 120
samtools_sam_to_bam:
walltime: 60
download_from_gridfs:
memory: 4000
picard_clean_sam:
memory: 4000
runtime: 60
picard_mark_duplicates:
memory: 4000
walltime: 60
picard_sort_sam:
memory: 4000
walltime: 60
idr:
memory: 16000
walltime: 1440
cores: 4
chr_locus_convert:
memory: 8000
cores: 2
macs2_narrow:
memory: 8000
cores: 2
macs2_broad:
memory: 8000
cores: 2
music_punctate:
arguments:
"--mapp": "/pylon5/mc4s9ip/aknecht/mm9_50bp"
music_narrow:
arguments:
"--mapp": "/pylon5/mc4s9ip/aknecht/mm9_50bp"
pegasus.catalog.site = XML
pegasus.catalog.site.file = bridges_sites.xml
pegasus.dir.useTimestamp = true
pegasus.condor.logs.symlink = false
pegasus.data.configuration = sharedfs
pegasus.transfer.links = true
pegasus.transfer.worker.package = True
pegasus.condor.arguments.quote = False
genomes:
mm9:
bowtie2: /pylon5/GROUP/USERNAME/mm9/mm9.genome.fa
bwa: /pylon5/GROUP/USERNAME/mm9/mm9.genome.fa
chrom.sizes: /pylon5/GROUP/USERNAME/mm9/mm9.chrom.sizes
runs:
- align: bwa
assembly: mm9
controls: &id001
- ENCFF001NIM
file_type: fastq
idr: &id002
- ENCFF001NIP
- ENCFF001NIS
peak: ccat
peak_type: broad
signals: &id003
- ENCFF001NIP
- ENCFF001NIS
- align: bowtie2
assembly: mm9
controls: *id001
file_type: fastq
idr: *id002
peak: ccat
peak_type: broad
signals: *id003
<?xml version="1.0" ?>
<sitecatalog version="4.0" xmlns="http://pegasus.isi.edu/schema/sitecatalog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd">
<site arch="x86_64" handle="local" os="LINUX">
<directory path="/pylon5/GROUP/USERNAME/bridges_test/2017-05-18_163011/work" type="shared-scratch">
<file-server operation="all" url="file:///pylon5/GROUP/USERNAME/bridges_test/2017-05-18_163011/work"/>
</directory>
<directory path="/pylon5/GROUP/USERNAME/bridges_test/2017-05-18_163011/output" type="local-storage">
<file-server operation="all" url="file:///pylon5/GROUP/USERNAME/bridges_test/2017-05-18_163011/output"/>
</directory>
<profile key="SSH_PRIVATE_KEY" namespace="env">/home/USERNAME/.ssh/bosco_key_noenc.rsa</profile>
</site>
<site arch="x86_64" handle="PSC_Bosco" os="LINUX">
<directory path="/pylon5/GROUP/USERNAME/bridges_test/2017-05-18_163011/work" type="shared-scratch">
<file-server operation="all" url="file:///pylon5/GROUP/USERNAME/bridges_test/2017-05-18_163011/work"/>
</directory>
<directory path="/pylon5/GROUP/USERNAME/bridges_test/2017-05-18_163011/output" type="shared-storage">
<file-server operation="all" url="file:///pylon5/GROUP/USERNAME/bridges_test/2017-05-18_163011/output"/>
</directory>
<grid contact="name@bridges.psc.edu" jobtype="compute" scheduler="slurm" type="batch"/>
<grid contact="name@bridges.psc.edu" jobtype="auxillary" scheduler="slurm" type="batch"/>
<profile key="queue" namespace="globus">RM-shared</profile>
<profile key="change.dir" namespace="pegasus">True</profile>
<profile key="style" namespace="pegasus">ssh</profile>
<profile key="glite.arguments" namespace="pegasus">-C EGRESS</profile>
</site>
</sitecatalog>
chipathlon_bin: /home/swanson/aknecht/.conda/envs/chip/bin
idr_bin: /home/swanson/aknecht/.conda/envs/idr/bin
pegasus_home: /usr/share/pegasus/
email: YOUREMAIL@DOMAIN.com
......@@ -13,7 +13,7 @@ parser.add_argument("--properties", dest="properties", required=True, help="Path
parser.add_argument("--execute-site", dest="execute_site", required=True, default="local", help="Target execute site. Sites should be defined in configuration.")
parser.add_argument("--output-site", dest="output_site", required=True, default="local", help="Target output site. Site should be defined in configuration.")
parser.add_argument("--chip-bin", dest="chip_bin", required=True, help="Path to chipathlon conda environment bin.")
parser.add_argumnet("--idr-bin", dest="idr_bin", required=True, help="Path to idr conda enviornment bin.")
parser.add_argument("--idr-bin", dest="idr_bin", required=True, help="Path to idr conda enviornment bin.")
parser.add_argument("--email", dest="email", help="An email address to notify when the workflow is finished.")
parser.add_argument("--no-save-db", dest="save_db", default=True, action="store_false", help="Whether or not to save results to the database. Default: True")
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment