chipathlon issueshttps://git.unl.edu/hcc/chipathlon/-/issues2017-07-06T16:46:34-05:00https://git.unl.edu/hcc/chipathlon/-/issues/36chip-meta-download should support resuming2017-07-06T16:46:34-05:00Adam Caprezchip-meta-download should support resumingDownloading all ~14k individual experiment JSON files can take multiple hours. Right now if the download of a single file fails, the entire download has be started over from the beginning. To support resuming via a command-line option,...Downloading all ~14k individual experiment JSON files can take multiple hours. Right now if the download of a single file fails, the entire download has be started over from the beginning. To support resuming via a command-line option, add a check to see if each experiment JSON file exists and has non-zero size. If so, skip fetching it and go to the next in the list.Adam CaprezAdam Caprezhttps://git.unl.edu/hcc/chipathlon/-/issues/34MongoDB auth should be optional2018-09-13T18:09:40-05:00Adam CaprezMongoDB auth should be optionalThe various `chip-*` helper scripts and the `db_save_result` / `download_from_gridfs` modules/scripts all require a username/password as required arguments. Mongo itself by default requires no authentication, so using creds should be op...The various `chip-*` helper scripts and the `db_save_result` / `download_from_gridfs` modules/scripts all require a username/password as required arguments. Mongo itself by default requires no authentication, so using creds should be optional params.Adam CaprezAdam Caprezhttps://git.unl.edu/hcc/chipathlon/-/issues/33.pyc files added as executables to DAX2018-09-13T18:09:40-05:00Adam Caprez.pyc files added as executables to DAXThe *.pyc files under `jobs/scripts` are getting added to the DAX as executables.The *.pyc files under `jobs/scripts` are getting added to the DAX as executables.Adam CaprezAdam Caprezhttps://git.unl.edu/hcc/chipathlon/-/issues/32Setuptools fixes2018-09-13T18:09:40-05:00Adam CaprezSetuptools fixesAdam CaprezAdam Caprezhttps://git.unl.edu/hcc/chipathlon/-/issues/31Remove module load from wrappers2018-09-13T18:09:40-05:00Adam CaprezRemove module load from wrappersRemove the HCC-specific module load statements from the wrapper scripts.Remove the HCC-specific module load statements from the wrapper scripts.Adam CaprezAdam Caprezhttps://git.unl.edu/hcc/chipathlon/-/issues/30Update Default Params2018-09-13T18:09:40-05:00aknecht2Update Default ParamsDefault params are not great -- most of them are just set to 2000 / 2000 memory & walltime. Make them sane.Default params are not great -- most of them are just set to 2000 / 2000 memory & walltime. Make them sane.aknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/28Documentation In-Depth2018-09-13T18:09:40-05:00aknecht2Documentation In-Depth* [ ] Examples
* [ ] Yaml markup explanations* [ ] Examples
* [ ] Yaml markup explanationsDocumentationaknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/27Sphinx Building2018-09-13T18:09:40-05:00aknecht2Sphinx BuildingGet sphinx building working!Get sphinx building working!Documentationaknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/26Zerone2018-09-13T18:09:40-05:00aknecht2Zerone* [ ] Test this peak calling to verify how it works (including available peak options)
* [ ] Implement into the peak_call module to generate args correctly
* [ ] Add required post processing to get into a sorted/expected bed format.* [ ] Test this peak calling to verify how it works (including available peak options)
* [ ] Implement into the peak_call module to generate args correctly
* [ ] Add required post processing to get into a sorted/expected bed format.Peak Calling / IdrNatasha PavlovikjNatasha Pavlovikjhttps://git.unl.edu/hcc/chipathlon/-/issues/24MUSIC peak caller2018-09-13T18:09:40-05:00aknecht2MUSIC peak caller* [x] Test this peak calling to verify how it works (including narrow / medium / broad peak options)
* [x] Implement into the peak_call module to generate args correctly
* [x] Add required post processing to get into a sorted/expected be...* [x] Test this peak calling to verify how it works (including narrow / medium / broad peak options)
* [x] Implement into the peak_call module to generate args correctly
* [x] Add required post processing to get into a sorted/expected bed format.Peak Calling / Idraknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/23Control output file transfer2018-09-13T18:09:40-05:00aknecht2Control output file transferCurrently transferring all output files. Only want to transfer sorted peak calling results / idr.Currently transferring all output files. Only want to transfer sorted peak calling results / idr.Peak Calling / Idraknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/22Gem output files bug2018-09-13T18:09:40-05:00aknecht2Gem output files bugGem creates is own unique subdirectories so output files are not getting transferred / sorted correctly.Gem creates is own unique subdirectories so output files are not getting transferred / sorted correctly.Peak Calling / Idraknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/21Parse ccat / peakranger output into bed files2018-09-13T18:09:40-05:00aknecht2Parse ccat / peakranger output into bed filesCurrently ccat / peakranger output parsing into bed files is not happening until after the workflow is run. Add the required post processing scripts to the workflow.Currently ccat / peakranger output parsing into bed files is not happening until after the workflow is run. Add the required post processing scripts to the workflow.Peak Calling / Idraknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/20SPP Wrapper2018-09-13T18:09:40-05:00aknecht2SPP WrapperCurrently the spp wrapper is trying to gunzip and rezip everything with the .narrowPeak.gz extension. This is causing data corruption for several output files. Come up with a solution to the work around.Currently the spp wrapper is trying to gunzip and rezip everything with the .narrowPeak.gz extension. This is causing data corruption for several output files. Come up with a solution to the work around.Peak Calling / Idraknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/19Update database saving2018-09-13T18:09:40-05:00aknecht2Update database savingCurrently there is special handling for bed / peak files, really we just want to drop them into gfs the same way we do with bam files.
Additionally we probably want to save multiple results per job -- Enable this type of saving.Currently there is special handling for bed / peak files, really we just want to drop them into gfs the same way we do with bam files.
Additionally we probably want to save multiple results per job -- Enable this type of saving.Peak Calling / Idraknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/17IDR2018-09-13T18:09:40-05:00aknecht2IDRImplement IDR for pairs of sorted result files. IDR will have to be a separate workflow_module to fit logically.Implement IDR for pairs of sorted result files. IDR will have to be a separate workflow_module to fit logically.Peak Calling / Idraknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/16Config parsing & validation2018-09-13T18:09:40-05:00aknecht2Config parsing & validationCurrently there is no validation being done on config files despite their being an expected format.Currently there is no validation being done on config files despite their being an expected format.Input Parsingaknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/15Add broad / narrow peak for available tools2018-09-13T18:09:40-05:00aknecht2Add broad / narrow peak for available toolsSome peak calling tools only support narrow peak, some tools support both narrow & broad peak. The run yaml allows you to specify narrow / broad peak but it currently has no functionality. Validate broad / narrow tools.
spp -> narrow ...Some peak calling tools only support narrow peak, some tools support both narrow & broad peak. The run yaml allows you to specify narrow / broad peak but it currently has no functionality. Validate broad / narrow tools.
spp -> narrow / broad
gem -> narrow
macs -> narrow / broad
peakranger -> narrow
ccat -> broadPeak Calling / Idraknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/14Create generator script2018-09-13T18:09:40-05:00Natasha PavlovikjCreate generator scriptInstead of using generator.py and have the parameters encoded, create a generator script where we can pass the parameters as options.Instead of using generator.py and have the parameters encoded, create a generator script where we can pass the parameters as options.Input Parsingaknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/13Job Param File Adjustments2018-09-13T18:09:40-05:00aknecht2Job Param File Adjustments* Don't require all jobs to be specified in param file
* Remove gene_annot argument from peakranger
* Include default read distribution / ccat config files* Don't require all jobs to be specified in param file
* Remove gene_annot argument from peakranger
* Include default read distribution / ccat config filesInput Parsingaknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/4Enhance type strictness and parsing2017-02-13T15:16:20-06:00aknecht2Enhance type strictness and parsingCurrently workflow module / workflow jobs don't have strict file type definitions / the file types are interpreted by the argument names.
Specfically arguments have type=file instead of something like type=fastq.Currently workflow module / workflow jobs don't have strict file type definitions / the file types are interpreted by the argument names.
Specfically arguments have type=file instead of something like type=fastq.Input Parsingaknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/11Run File Helper Script2017-04-05T10:51:41-05:00aknecht2Run File Helper ScriptCreate a tool for user to make it easier to create input run file.Create a tool for user to make it easier to create input run file.Input Parsingaknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/8Module generator improvements2017-02-03T14:34:46-06:00aknecht2Module generator improvementsGenerators should parse a run by doing the following:
1. Create all necessary Result classes for the final jobs in the current module, and check the existence of the results in the database.
2. (If result exists) -> Do nothing.
...Generators should parse a run by doing the following:
1. Create all necessary Result classes for the final jobs in the current module, and check the existence of the results in the database.
2. (If result exists) -> Do nothing.
(If result does not exist, and prev module result did) -> queue download job, and create jobs normally
(If result does not exist, and prev module result didn't) -> create jobs normally
Generators will need to be passed the dax/master_files objects to be able to add jobs directly to the dax. Similarly, we can move the save_results functionality to the generators.
Enhancements to Result class / Run classes should be handled first.Workflow Module Refactoraknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/6Create result class2017-01-26T12:02:33-06:00aknecht2Create result classCurrently, intermediate output is handled in a large data structure, should move this into a class for better readability.Currently, intermediate output is handled in a large data structure, should move this into a class for better readability.Workflow Module Refactoraknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/9Create genome class2017-02-03T14:34:45-06:00aknecht2Create genome classCurrently genome files are handled in a data structure, creating a class to easily grab additionally assembly files and chromosome sizes specific to assemblers and genomes.Currently genome files are handled in a data structure, creating a class to easily grab additionally assembly files and chromosome sizes specific to assemblers and genomes.Workflow Module Refactoraknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/12Enhance Input Run Format / Parsing2017-02-03T02:10:15-06:00aknecht2Enhance Input Run Format / ParsingSeveral things here:
* Genome should have species in addition to assembly
* control / signal files should be specified as lists
* idr specification should be granular i.e. which signal / control files to run for idr
* peak calling type =...Several things here:
* Genome should have species in addition to assembly
* control / signal files should be specified as lists
* idr specification should be granular i.e. which signal / control files to run for idr
* peak calling type == broad / narrow, validate against peak method as wellInput Parsingaknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/10Create new run data class2017-01-25T19:01:47-06:00aknecht2Create new run data classRun data class should hold only information about one individual run i.e. the smallest workflow that can be run. It should contain:
* genome information
* accession numbers
* peak calling method
* whether or not to use idr
* input ...Run data class should hold only information about one individual run i.e. the smallest workflow that can be run. It should contain:
* genome information
* accession numbers
* peak calling method
* whether or not to use idr
* input file typeWorkflow Module Refactoraknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/2Adjust input run.yaml format2017-01-26T12:02:33-06:00aknecht2Adjust input run.yaml formatRequire pairs of accessions for exp/control samples instead of experiment accession
Create run_parser class to return list of run classes, and list of genome classes.
Adding the sample format here for future reference
run.yaml
==...Require pairs of accessions for exp/control samples instead of experiment accession
Create run_parser class to return list of run classes, and list of genome classes.
Adding the sample format here for future reference
run.yaml
========
genomes:
- assembly:
- [bwa]: "/path/to/base/file"
- [bowtie2]: "/path/to/base/file"
- chrom.sizes: "/path/to/chrom/file"
runs:
- align: [bwa | bowtie2]
- assembly: "assembly"
- peak: [gem|ccat|peakranger|spp|macs2]
- idr: [true|false]
- file_type: [fastq|bam]
- control1: "accession"
- [control2]: "accession"
- signal1: "accession"
- [signal2]: "accession"Workflow Module Refactoraknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/3Method Auto Doc2017-06-05T13:34:58-05:00aknecht2Method Auto Doc* [x] workflow.py
* [x] workflow_module.py
* [x] workflow_job.py
* [x] run.py
* [x] genome.py
* [x] result.py
* [x] generators* [x] workflow.py
* [x] workflow_module.py
* [x] workflow_job.py
* [x] run.py
* [x] genome.py
* [x] result.py
* [x] generatorsDocumentationaknecht2aknecht2https://git.unl.edu/hcc/chipathlon/-/issues/5Add download support for bam files2017-02-03T13:50:12-06:00aknecht2Add download support for bam filesWorkflow should be able to start by downloading fastq files OR bam files.Workflow should be able to start by downloading fastq files OR bam files.Input Parsingaknecht2aknecht2