Commit 5443d27d authored by aknecht2's avatar aknecht2
Browse files

Updated the getting started example to be up-to-date.

parent 4413357d
Examples
==========
Whenever generating a workflow, there are three required files. A config file,
a run file, and a param file. The config file is used to specify system
information -- paths to required software, environment variables for pegasus
and so on. The run file is used to specify the actual files to process and
what software tools to use on them. Finally, the param file is used to
override any default params for the jobs in the workflow. In each of the
examples below, all three of these files will be talked about, and download
links to each will be provided.
Whenever generating a workflow, there are five total required files you
will need to create:
* **Config File**
A few pieces of info need to be defined in here, specifically the bin path
to the chipathlon environment, the bin path to the idr environment, and
the email address to message when the workflow is complete.
* **Param File**
Allows the user to overwrite options for many of the software tools being
used. Most numeric arguments have defaults that can be changed by the
end-user.
* **Run File**
Describes the actually files to process and what alignment / peak calling
tools should be used on them, and whether or not to run idr.
* **Properties File**
One of the required files by pegasus. For more information see their
`properties documentation <https://pegasus.isi.edu/documentation/properties.php>`_
* **Sites File**
One of the required files by pegasus. For more information see their
`sites catalog documentation <https://pegasus.isi.edu/documentation/site.php>`_
The information located in the properties file will be highly specific to
the environment that you're submitting on. Additionally, genomic information
is expected to be downloaded & built for the target genome you're interested
in, as well as a chromsome sizes files.
Supported Tools
^^^^^^^^^^^^^^^^
Alignment:
* `bwa <http://bio-bwa.sourceforge.net>`_
* `bowtie2 <http://bowtie-bio.sourceforge.net/bowtie2/index.shtml>`_
Peak Calling:
* `spp <https://github.com/hms-dbmi/spp>`_ (narrow, broad)
* `zerone <https://omictools.com/zerone-tool>`_ (broad)
* `macs2 <https://github.com/taoliu/MACS>`_ (narrow, broad)
* `gem <http://groups.csail.mit.edu/cgs/gem/>`_ (narrow)
* `peakranger <http://ranger.sourceforge.net/manual1.18.html>`_ (narrow)
* `ccat <http://ranger.sourceforge.net/manual1.18.html>`_ (broad)
* `music <https://github.com/gersteinlab/MUSIC>`_ (narrow, punctate, broad)
* `pepr <https://github.com/shawnzhangyx/PePr>`_ (narrow)
* `hiddendomains <http://hiddendomains.sourceforge.net/>`_ (broad)
Getting Started
^^^^^^^^^^^^^^^^
......@@ -18,34 +55,28 @@ Getting Started
:download:`Param <examples/small_test_param.yaml>`
:download:`Properties <examples/small_test_properties.txt>`
:download:`Sites <examples/small_test_sites.xml>`
**Config**
.. code-block:: yaml
.. code-block:: text
chipathlon_bin: /home/swanson/aknecht/.conda/envs/chip/bin
idr_bin: /home/swanson/aknecht/.conda/envs/idr/bin
pegasus_home: /usr/share/pegasus/
email: YOUREMAIL@DOMAIN.com
notify:
pegasus_home: "/usr/share/pegasus/"
email: "avi@kurtknecht.com"
profile:
pegasus:
style: "glite"
condor:
grid_resource: "pbs"
universe: "vanilla"
batch_queue: "batch"
env:
PYTHONPATH: "/home/swanson/aknecht/.conda/envs/ih_env/lib/python2.7/site-packages/"
PATH: "/home/swanson/aknecht/.conda/envs/ih_env/bin:/bin/:/usr/bin/:/usr/local/bin/"
PEGASUS_HOME: "/usr/"
Specifying an email in the config file will send an email to the target
address once the workflow is complete. The pegasus_home definition corresponds
to the pegasus install location. This is necessary so the pegasus email
script (in pegasus/notification/email) can be found and executed successfully.
The config file profile information is passed through to the pegasus
`sites catalog <https://pegasus.isi.edu/documentation/site.php>`_. This allows
any pegasus `profile <https://pegasus.isi.edu/documentation/profiles.php>`_
information to be passed. The required information will be dependent on the
system you are submitting to.
The top two lines define the bin paths to the chipathlon and idr environments.
The paths will depend on where you created your environments, but if you
followed the installation instructions they will be in your home directory in
the .conda folder. These two paths are required to find all the necessary
software to execute. Specifying an email in the config file will send an email
to the target address once the workflow is complete. The pegasus_home
definition corresponds to the pegasus install location. This is necessary so
the pegasus email script (in pegasus/notification/email) can be found and
executed successfully.
**Run**
......@@ -97,8 +128,8 @@ necessary for processing:
Defines the type of files that processing initial begins with. Should be
either fastq or bam.
* peak
The tool used for peak calling. Should be one of [spp, gem, macs2,
peakranger, ccat, zerone, music].
The tool used for peak calling. Above in the supported tools section there
is a list defining all peak calling tools, and their supporting peak types.
* peak_type
The type of peak calling to perform. The peak type is tool dependent,
as tools support different peak calling types. Usually peak_type is narrow
......@@ -115,8 +146,10 @@ When creating runs, often times you'll want to investigate the same files
with multiple different peak calling and alignment tools. In the case above,
the two runs defined are identical except for the alignment tool -- one uses
bwa and the other uses bowite2. To avoid retyping a lot of information, lists
can be marked with ids using the & symbol. Later on in the file, the list can
be referenced using the * symbol.
can be marked with ids using the & symbol and a unique identifier. Later on in
the file, the list can be referenced using the * symbol. Since we are only
changing the alignment tool there's no need to type out all the samples a
second time.
**Param**
......@@ -125,6 +158,10 @@ be referenced using the * symbol.
macs2_callpeak:
arguments:
"-g": "mm"
bwa_align_single:
arguments:
"-l": 20
"-q": 6
music_punctate:
arguments:
"--mapp": "/work/ladunga/SHARED/workflows/mm9_50bp"
......@@ -144,6 +181,46 @@ specify the "-g": "mm" for macs2 peak calling. The music peak caller requires
additional information to run successfully (even though we are not using it).
Finally, we specify not to remove duplicates.
**Properties**
.. code-block:: text
pegasus.catalog.site = XML
pegasus.catalog.site.file = small_test_sites.xml
pegasus.condor.logs.symlink = false
pegasus.transfer.links = true
pegasus.data.configuration = sharedfs
Again, for more information on the properties file consult the pegasus
`properties documentation <https://pegasus.isi.edu/documentation/properties.php>`_
**Sites**
.. code-block:: xml
<?xml version="1.0" ?>
<sitecatalog version="4.0" xmlns="http://pegasus.isi.edu/schema/sitecatalog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd">
<site arch="x86_64" handle="local" os="LINUX">
<directory path="/lustre/work/ladunga/SHARED/workflows/new_tests/full_test/work" type="shared-scratch">
<file-server operation="all" url="file:///lustre/work/ladunga/SHARED/workflows/new_tests/full_test/work"/>
</directory>
<directory path="/lustre/work/ladunga/SHARED/workflows/new_tests/full_test/output" type="local-storage">
<file-server operation="all" url="file:///lustre/work/ladunga/SHARED/workflows/new_tests/full_test/output"/>
</directory>
<profile key="change.dir" namespace="pegasus">true</profile>
<profile key="transfer.threads" namespace="pegasus">4</profile>
<profile key="universe" namespace="condor">vanilla</profile>
<profile key="grid_resource" namespace="condor">pbs</profile>
<profile key="batch_queue" namespace="condor">batch</profile>
<profile key="style" namespace="pegasus">glite</profile>
</site>
</sitecatalog>
Again, for more information on the sites file consult the pegasus
`sites catalog documentation <https://pegasus.isi.edu/documentation/site.php>`_
**Generation**
To generate the workflow, pass these input files into the :ref:`chip-gen`
......@@ -154,11 +231,12 @@ script, like so:
chip-gen \
--dir DIRECTORY_NAME \
--host DB_HOST \
--username USERNAME \
--password PASSWORD \
--param param.yaml \
--conf config.yaml \
--run run.yaml
--run run.yaml \
--properties properties.txt \
--execute-site local \
--output-site local
This will generate all files necessary to run the workflow in the specified
directory under a date-time stamped folder. The structure will look like this:
......@@ -169,10 +247,8 @@ directory under a date-time stamped folder. The structure will look like this:
date-timestamp/
input/
chipathlon.dax
conf.rc
db_meta/
notify.sh
sites.xml
submit.sh
output/
work/
......
notify:
pegasus_home: "/usr/share/pegasus/"
email: "avi@kurtknecht.com"
profile:
pegasus:
style: "glite"
condor:
grid_resource: "pbs"
universe: "vanilla"
batch_queue: "batch"
env:
PYTHONPATH: "/home/swanson/aknecht/.conda/envs/ih_env/lib/python2.7/site-packages/"
PATH: "/home/swanson/aknecht/.conda/envs/ih_env/bin:/bin/:/usr/bin/:/usr/local/bin/"
PEGASUS_HOME: "/usr/"
chipathlon_bin: /home/swanson/aknecht/.conda/envs/chip/bin
idr_bin: /home/swanson/aknecht/.conda/envs/idr/bin
pegasus_home: /usr/share/pegasus/
email: YOUREMAIL@DOMAIN.com
macs2_callpeak:
arguments:
"-g": "mm"
bwa_align_single:
arguments:
"-l": 20
"-q": 6
music_punctate:
arguments:
"--mapp": "/work/ladunga/SHARED/workflows/mm9_50bp"
......
pegasus.catalog.site = XML
pegasus.catalog.site.file = small_test_sites.xml
pegasus.condor.logs.symlink = false
pegasus.transfer.links = true
pegasus.data.configuration = sharedfs
<?xml version="1.0" ?>
<sitecatalog version="4.0" xmlns="http://pegasus.isi.edu/schema/sitecatalog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd">
<site arch="x86_64" handle="local" os="LINUX">
<directory path="/lustre/work/ladunga/SHARED/workflows/new_tests/full_test/work" type="shared-scratch">
<file-server operation="all" url="file:///lustre/work/ladunga/SHARED/workflows/new_tests/full_test/work"/>
</directory>
<directory path="/lustre/work/ladunga/SHARED/workflows/new_tests/full_test/output" type="local-storage">
<file-server operation="all" url="file:///lustre/work/ladunga/SHARED/workflows/new_tests/full_test/output"/>
</directory>
<profile key="change.dir" namespace="pegasus">true</profile>
<profile key="transfer.threads" namespace="pegasus">4</profile>
<profile key="universe" namespace="condor">vanilla</profile>
<profile key="grid_resource" namespace="condor">pbs</profile>
<profile key="batch_queue" namespace="condor">batch</profile>
<profile key="style" namespace="pegasus">glite</profile>
</site>
</sitecatalog>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment