Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Holland Computing Center
chipathlon
Commits
5443d27d
Commit
5443d27d
authored
Jun 12, 2017
by
aknecht2
Browse files
Updated the getting started example to be up-to-date.
parent
4413357d
Changes
5
Hide whitespace changes
Inline
Side-by-side
doc/source/examples.rst
View file @
5443d27d
Examples
==========
Whenever generating a workflow, there are three required files. A config file,
a run file, and a param file. The config file is used to specify system
information -- paths to required software, environment variables for pegasus
and so on. The run file is used to specify the actual files to process and
what software tools to use on them. Finally, the param file is used to
override any default params for the jobs in the workflow. In each of the
examples below, all three of these files will be talked about, and download
links to each will be provided.
Whenever generating a workflow, there are five total required files you
will need to create:
* **Config File**
A few pieces of info need to be defined in here, specifically the bin path
to the chipathlon environment, the bin path to the idr environment, and
the email address to message when the workflow is complete.
* **Param File**
Allows the user to overwrite options for many of the software tools being
used. Most numeric arguments have defaults that can be changed by the
end-user.
* **Run File**
Describes the actually files to process and what alignment / peak calling
tools should be used on them, and whether or not to run idr.
* **Properties File**
One of the required files by pegasus. For more information see their
`properties documentation <https://pegasus.isi.edu/documentation/properties.php>`_
* **Sites File**
One of the required files by pegasus. For more information see their
`sites catalog documentation <https://pegasus.isi.edu/documentation/site.php>`_
The information located in the properties file will be highly specific to
the environment that you're submitting on. Additionally, genomic information
is expected to be downloaded & built for the target genome you're interested
in, as well as a chromsome sizes files.
Supported Tools
^^^^^^^^^^^^^^^^
Alignment:
* `bwa <http://bio-bwa.sourceforge.net>`_
* `bowtie2 <http://bowtie-bio.sourceforge.net/bowtie2/index.shtml>`_
Peak Calling:
* `spp <https://github.com/hms-dbmi/spp>`_ (narrow, broad)
* `zerone <https://omictools.com/zerone-tool>`_ (broad)
* `macs2 <https://github.com/taoliu/MACS>`_ (narrow, broad)
* `gem <http://groups.csail.mit.edu/cgs/gem/>`_ (narrow)
* `peakranger <http://ranger.sourceforge.net/manual1.18.html>`_ (narrow)
* `ccat <http://ranger.sourceforge.net/manual1.18.html>`_ (broad)
* `music <https://github.com/gersteinlab/MUSIC>`_ (narrow, punctate, broad)
* `pepr <https://github.com/shawnzhangyx/PePr>`_ (narrow)
* `hiddendomains <http://hiddendomains.sourceforge.net/>`_ (broad)
Getting Started
^^^^^^^^^^^^^^^^
...
...
@@ -18,34 +55,28 @@ Getting Started
:download:`Param <examples/small_test_param.yaml>`
:download:`Properties <examples/small_test_properties.txt>`
:download:`Sites <examples/small_test_sites.xml>`
**Config**
.. code-block:: yaml
.. code-block:: text
chipathlon_bin: /home/swanson/aknecht/.conda/envs/chip/bin
idr_bin: /home/swanson/aknecht/.conda/envs/idr/bin
pegasus_home: /usr/share/pegasus/
email: YOUREMAIL@DOMAIN.com
notify:
pegasus_home: "/usr/share/pegasus/"
email: "avi@kurtknecht.com"
profile:
pegasus:
style: "glite"
condor:
grid_resource: "pbs"
universe: "vanilla"
batch_queue: "batch"
env:
PYTHONPATH: "/home/swanson/aknecht/.conda/envs/ih_env/lib/python2.7/site-packages/"
PATH: "/home/swanson/aknecht/.conda/envs/ih_env/bin:/bin/:/usr/bin/:/usr/local/bin/"
PEGASUS_HOME: "/usr/"
Specifying an email in the config file will send an email to the target
address once the workflow is complete. The pegasus_home definition corresponds
to the pegasus install location. This is necessary so the pegasus email
script (in pegasus/notification/email) can be found and executed successfully.
The config file profile information is passed through to the pegasus
`sites catalog <https://pegasus.isi.edu/documentation/site.php>`_. This allows
any pegasus `profile <https://pegasus.isi.edu/documentation/profiles.php>`_
information to be passed. The required information will be dependent on the
system you are submitting to.
The top two lines define the bin paths to the chipathlon and idr environments.
The paths will depend on where you created your environments, but if you
followed the installation instructions they will be in your home directory in
the .conda folder. These two paths are required to find all the necessary
software to execute. Specifying an email in the config file will send an email
to the target address once the workflow is complete. The pegasus_home
definition corresponds to the pegasus install location. This is necessary so
the pegasus email script (in pegasus/notification/email) can be found and
executed successfully.
**Run**
...
...
@@ -97,8 +128,8 @@ necessary for processing:
Defines the type of files that processing initial begins with. Should be
either fastq or bam.
* peak
The tool used for peak calling.
Should be one of [spp, gem, macs2,
peakranger, ccat, zerone, music]
.
The tool used for peak calling.
Above in the supported tools section there
is a list defining all peak calling tools, and their supporting peak types
.
* peak_type
The type of peak calling to perform. The peak type is tool dependent,
as tools support different peak calling types. Usually peak_type is narrow
...
...
@@ -115,8 +146,10 @@ When creating runs, often times you'll want to investigate the same files
with multiple different peak calling and alignment tools. In the case above,
the two runs defined are identical except for the alignment tool -- one uses
bwa and the other uses bowite2. To avoid retyping a lot of information, lists
can be marked with ids using the & symbol. Later on in the file, the list can
be referenced using the * symbol.
can be marked with ids using the & symbol and a unique identifier. Later on in
the file, the list can be referenced using the * symbol. Since we are only
changing the alignment tool there's no need to type out all the samples a
second time.
**Param**
...
...
@@ -125,6 +158,10 @@ be referenced using the * symbol.
macs2_callpeak:
arguments:
"-g": "mm"
bwa_align_single:
arguments:
"-l": 20
"-q": 6
music_punctate:
arguments:
"--mapp": "/work/ladunga/SHARED/workflows/mm9_50bp"
...
...
@@ -144,6 +181,46 @@ specify the "-g": "mm" for macs2 peak calling. The music peak caller requires
additional information to run successfully (even though we are not using it).
Finally, we specify not to remove duplicates.
**Properties**
.. code-block:: text
pegasus.catalog.site = XML
pegasus.catalog.site.file = small_test_sites.xml
pegasus.condor.logs.symlink = false
pegasus.transfer.links = true
pegasus.data.configuration = sharedfs
Again, for more information on the properties file consult the pegasus
`properties documentation <https://pegasus.isi.edu/documentation/properties.php>`_
**Sites**
.. code-block:: xml
<?xml version="1.0" ?>
<sitecatalog version="4.0" xmlns="http://pegasus.isi.edu/schema/sitecatalog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd">
<site arch="x86_64" handle="local" os="LINUX">
<directory path="/lustre/work/ladunga/SHARED/workflows/new_tests/full_test/work" type="shared-scratch">
<file-server operation="all" url="file:///lustre/work/ladunga/SHARED/workflows/new_tests/full_test/work"/>
</directory>
<directory path="/lustre/work/ladunga/SHARED/workflows/new_tests/full_test/output" type="local-storage">
<file-server operation="all" url="file:///lustre/work/ladunga/SHARED/workflows/new_tests/full_test/output"/>
</directory>
<profile key="change.dir" namespace="pegasus">true</profile>
<profile key="transfer.threads" namespace="pegasus">4</profile>
<profile key="universe" namespace="condor">vanilla</profile>
<profile key="grid_resource" namespace="condor">pbs</profile>
<profile key="batch_queue" namespace="condor">batch</profile>
<profile key="style" namespace="pegasus">glite</profile>
</site>
</sitecatalog>
Again, for more information on the sites file consult the pegasus
`sites catalog documentation <https://pegasus.isi.edu/documentation/site.php>`_
**Generation**
To generate the workflow, pass these input files into the :ref:`chip-gen`
...
...
@@ -154,11 +231,12 @@ script, like so:
chip-gen \
--dir DIRECTORY_NAME \
--host DB_HOST \
--username USERNAME \
--password PASSWORD \
--param param.yaml \
--conf config.yaml \
--run run.yaml
--run run.yaml \
--properties properties.txt \
--execute-site local \
--output-site local
This will generate all files necessary to run the workflow in the specified
directory under a date-time stamped folder. The structure will look like this:
...
...
@@ -169,10 +247,8 @@ directory under a date-time stamped folder. The structure will look like this:
date-timestamp/
input/
chipathlon.dax
conf.rc
db_meta/
notify.sh
sites.xml
submit.sh
output/
work/
...
...
doc/source/examples/small_test_config.yaml
View file @
5443d27d
notify
:
pegasus_home
:
"
/usr/share/pegasus/"
email
:
"
avi@kurtknecht.com"
profile
:
pegasus
:
style
:
"
glite"
condor
:
grid_resource
:
"
pbs"
universe
:
"
vanilla"
batch_queue
:
"
batch"
env
:
PYTHONPATH
:
"
/home/swanson/aknecht/.conda/envs/ih_env/lib/python2.7/site-packages/"
PATH
:
"
/home/swanson/aknecht/.conda/envs/ih_env/bin:/bin/:/usr/bin/:/usr/local/bin/"
PEGASUS_HOME
:
"
/usr/"
chipathlon_bin
:
/home/swanson/aknecht/.conda/envs/chip/bin
idr_bin
:
/home/swanson/aknecht/.conda/envs/idr/bin
pegasus_home
:
/usr/share/pegasus/
email
:
YOUREMAIL@DOMAIN.com
doc/source/examples/small_test_param.yaml
View file @
5443d27d
macs2_callpeak
:
arguments
:
"
-g"
:
"
mm"
bwa_align_single
:
arguments
:
"
-l"
:
20
"
-q"
:
6
music_punctate
:
arguments
:
"
--mapp"
:
"
/work/ladunga/SHARED/workflows/mm9_50bp"
...
...
doc/source/examples/small_test_properties.txt
0 → 100644
View file @
5443d27d
pegasus.catalog.site = XML
pegasus.catalog.site.file = small_test_sites.xml
pegasus.condor.logs.symlink = false
pegasus.transfer.links = true
pegasus.data.configuration = sharedfs
doc/source/examples/small_test_sites.xml
0 → 100644
View file @
5443d27d
<?xml version="1.0" ?>
<sitecatalog
version=
"4.0"
xmlns=
"http://pegasus.isi.edu/schema/sitecatalog"
xmlns:xsi=
"http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation=
"http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd"
>
<site
arch=
"x86_64"
handle=
"local"
os=
"LINUX"
>
<directory
path=
"/lustre/work/ladunga/SHARED/workflows/new_tests/full_test/work"
type=
"shared-scratch"
>
<file-server
operation=
"all"
url=
"file:///lustre/work/ladunga/SHARED/workflows/new_tests/full_test/work"
/>
</directory>
<directory
path=
"/lustre/work/ladunga/SHARED/workflows/new_tests/full_test/output"
type=
"local-storage"
>
<file-server
operation=
"all"
url=
"file:///lustre/work/ladunga/SHARED/workflows/new_tests/full_test/output"
/>
</directory>
<profile
key=
"change.dir"
namespace=
"pegasus"
>
true
</profile>
<profile
key=
"transfer.threads"
namespace=
"pegasus"
>
4
</profile>
<profile
key=
"universe"
namespace=
"condor"
>
vanilla
</profile>
<profile
key=
"grid_resource"
namespace=
"condor"
>
pbs
</profile>
<profile
key=
"batch_queue"
namespace=
"condor"
>
batch
</profile>
<profile
key=
"style"
namespace=
"pegasus"
>
glite
</profile>
</site>
</sitecatalog>
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment