- Finish adding the peak calling tools.
- Start work on the pegasus paper.
- 3 papers. Dual computational (Pegasus pipeline) / biological papers, resubmit express ortho.
- PePR, Hidden Domains, & Music tests are working, need to get them implemented
- More Documentation! Especially getting read the docs setup.
- XSede stuff too! Very close.
- Hidden Domains
- Create skeleton of paper
- Need to decide how to handle spp broad peaks.
- Further work to get MUSIC, PEPR, CHILLIN, ZERONE
- Download update encode json dump
- Download updated encode json dump and update
- Get broad/narrow peaks & idr in by next week
- Cleanup database tables.
- Work toward getting bridges up & running by the end of next week.
- Decide on full research allocation for Bridges.
- Get new peak calling tools.
- No NIH :(
- Express Ortho paper review & resubmit
- Mini tool comparison for ~10 tools, talk to developers
- March 13th / Great Lakes Conference abstract due
- Dual paper submit, have pegasus ready for submission.
- Great Lakes Conference in Chicago May 15-17th
- Review Paper by Monday, Deadline for submission is next Wednesday
- Generate new swooshy flow chart for the workflow
- Think about scoring algorithm for different peak calling tool
- Add idr granularity
- Keep using random pairing for control / signal
- Adjust signal input to list
- Validate idr specification / narrow & broad peak specification
- Generate read distribution from bam, otherwise use default
- Create default ccat config file.
- Remove peakranger gene annotation file
- Modify ExpressOrtho perl script to confirm to perl style standards.
- Fetch bams or fastqs.
- Specify file accession for control and experiment instead of experiment accession.
- Add IDR for 2 replicates. Wait on response to figure out. (IDR / FDR?)
- MongoDB read only database to clone. MongoDB helper scripts to update database. Need to create own db instance for saving results.
- Don't need to save output files back to the database.
- Generate report log at the end.
- Remove Duplicates yes / no & duplicates.
- Pooled & Pseudo replicates - pooled concatenate, pseudo shuffled (bam -> sam for plaintext). Configurable option.
- Sphinx documentation at the same time.
- Organize files by organism -> cell_type -> tf/hm -> (biorep1, biorep2, idr)
- Human: CHD2 for k562 H1-hESC cells
- macs2 paired end reads need to be run differently check accordingly.
- convert/sort/idr for individual samples.
- Grant application in April, hear back in May
- Mirrored papers, ChIPathlon / Biological.
- Next steps: Re run human / mouse with individual samples, run idr on the output.
- CH12.lx / GM12878 if time permits. End Goals:
- IDR in workflow
- Pooled/Pseudo replicates in workflow
- Merged replicates in workflow
- Don't need GUI
- Add gem
- For DFBS: macs2, csaw, jmosaic
- Created dummy sample entries to run pipeline faster!
- Look into better downloading tools to increase speed
- Potentially do comparison paper of no SQL -> sql for bioinformatics. Run similar setup/design/analysis in SQL and see how it compares.
- 3 Papers (Biological paper, Pegasus paper, MongoDB paper)!?
- Don't use the assembly from the samples record, use the Grch assembly.
- bowtie2 standard error continues quality measures!
- for now focus on bowtie2
- Add aggregation pipelines to meta to extract relevant transcription factors
- Run on Myc / Max -- wait on correct experiments to use!
- Nothing else!
- Most conditions don't exist, exist for some.
- Collapse Bed & Peak Collections
- Only need one of score or signal_value
- Need to derive read length from downloaded fastq
- Randomly select experiment to control for peak calling, only use possible_controls
- Only need human / mouse
- DFBS keep everything same except one of condition OR cell type. (ignore for now)
- Restrict to database, add genome collection.