(AKA: Stuff that broke when I tried to use Bridges)
Trying to automagically generate the Pegasus
conf.rcfiles probably isn't going to be feasible due to the variety of run-time environments people might be using. Probably better is to add command-line flags to the
chip-genscript to specify paths to existing files which are then just copied to the workflow directory. We can provide examples in the docs for a few types of environments to help people create the files for their particular setup. (Adam will do)
DB files must exist locally or the code bails. This is fine when running directly, but when using BOSCO this means the DB will get staged again for every run via SSH. Would be better to allow specifying a location on the remote resource. This would require people to manually copy the files initially, but then Pegasus would just do symlinking for each run. (Adam will do)
All the executables are added using
site="local". That's probably ok for things in
jobs/wrappers, but the system commands (mv, cp, etc.) should be added using the remote site. If added using local, Pegasus will copy the binaries, which might cause issues if the submitting machine is a different enough version of Linux (or OS X even). Adding using site='local' does require people create a chipathlon and idr conda environment manually, and then adding the bin PATH to the sites.xml. Adding the remote site executables would be nice; that would still require the user to create the two conda envs, and then pass the bin paths to the
chip-genscript. (Adam will do)
In order for the above three things to work, the remote site name will also need to be given as an argument to
chip-gen, so that when adding remote input files or executables to the DAX the location is correctly specified. (Adam will do)
Default resource requirements are way off. (Avi mostly fixed)
The BOSCO installer for a remote resource doesn't include the helper script to translate Pegasus resource specifications to SLURM job attributes (Done, modified BOSCO package created.
slurm_local_submit_attributes.sh). Not really a chipathlon problem, but needs fixed. (Adam will fix BOSCO package)
The code that adds things in the
jobs/scriptsdirectory as executables is also picking up the
.pycbyte-compiled files. Doesn't really break anything, but should be cleaned up.
The Picard version used (1.139) is quite old. Would be nice to be able to use the current 2.9.0 version. (Natasha will test)
The two SPP scripts (Done now,
run_spp_nodups.R) need to be packaged up in Conda in a sane way. (Adam will do)
How to distribute the MongoDB stuff in a sane way. Docker is currently the leading candidate. The size of the experiments and samples collection is a little over 1GB, which isn't terribly large. We could create a container with Mongo installed and the DB pre-populated, and then include scripts to do the update from ENCODE.