packaging fixes · Wiki · Holland Computing Center / chipathlon

(AKA: Stuff that broke when I tried to use Bridges)

Trying to automagically generate the Pegasus sites.xml and conf.rc files probably isn't going to be feasible due to the variety of run-time environments people might be using. Probably better is to add command-line flags to the chip-gen script to specify paths to existing files which are then just copied to the workflow directory. We can provide examples in the docs for a few types of environments to help people create the files for their particular setup. (Adam will do)
DB files must exist locally or the code bails. This is fine when running directly, but when using BOSCO this means the DB will get staged again for every run via SSH. Would be better to allow specifying a location on the remote resource. This would require people to manually copy the files initially, but then Pegasus would just do symlinking for each run. (Adam will do)
All the executables are added using site="local". That's probably ok for things in jobs/scripts and jobs/wrappers, but the system commands (mv, cp, etc.) should be added using the remote site. If added using local, Pegasus will copy the binaries, which might cause issues if the submitting machine is a different enough version of Linux (or OS X even). Adding using site='local' does require people create a chipathlon and idr conda environment manually, and then adding the bin PATH to the sites.xml. Adding the remote site executables would be nice; that would still require the user to create the two conda envs, and then pass the bin paths to the chip-gen script. (Adam will do)
In order for the above three things to work, the remote site name will also need to be given as an argument to chip-gen, so that when adding remote input files or executables to the DAX the location is correctly specified. (Adam will do)
Default resource requirements are way off. (Avi mostly fixed)
The BOSCO installer for a remote resource doesn't include the helper script to translate Pegasus resource specifications to SLURM job attributes (slurm_local_submit_attributes.sh). Not really a chipathlon problem, but needs fixed. (Adam will fix BOSCO package) Done, modified BOSCO package created.
The code that adds things in the jobs/scripts directory as executables is also picking up the .pyc byte-compiled files. Doesn't really break anything, but should be cleaned up.
The Picard version used (1.139) is quite old. Would be nice to be able to use the current 2.9.0 version. (Natasha will test)
~~The two SPP scripts (run_spp.R and run_spp_nodups.R) need to be packaged up in Conda in a sane way. (Adam will do)~~ Done now, phantompeakqualtools package created.
How to distribute the MongoDB stuff in a sane way. Docker is currently the leading candidate. The size of the experiments and samples collection is a little over 1GB, which isn't terribly large. We could create a container with Mongo installed and the DB pre-populated, and then include scripts to do the update from ENCODE.

Comments

Please register or sign in to add a comment.