From dab44cf8f804e8f1e72f7fdbd26d99de79815d97 Mon Sep 17 00:00:00 2001
From: mjeppesen3 <mjeppesen3@huskers.unl.edu>
Date: Mon, 4 Mar 2024 13:00:15 -0600
Subject: [PATCH] Update README.md

---
 Synthetic data creation scripts/README.md | 60 +++++++++++++++++------
 1 file changed, 45 insertions(+), 15 deletions(-)

diff --git a/Synthetic data creation scripts/README.md b/Synthetic data creation scripts/README.md
index eebc1a1c..e340edd6 100644
--- a/Synthetic data creation scripts/README.md	
+++ b/Synthetic data creation scripts/README.md	
@@ -1,23 +1,53 @@
-Simulated LC-MS Data Used in Benchmarking of MVAPACK
-This repository contains the workflow for creating the simulated data used in the MVAPACK simulated LC-MS data project. Below is a brief description of the relevant files in this repository.
+mulated LC-MS Data Used in Benchmarking of MVAPACK 
 
-./main.py
-This is the driver script for creating the nine versions of the simulated dataset. The script is not run with any commandline arguments. It has been rng seeded to ensure the same data is created each time. Revolves around the ConditionManager class defined in condition_manager.py, which takes the ClassEic's, significance mapper, percent missing level, and noise level, to output the full 20 replicates for each condition and outputs the respective .mzML files. Requires the following packages to run: + matplotlib + numpy + psims
+This repository contains the workflow for creating the simulated data used 
+in the MVAPACK simulated LC-MS data project. Below is a brief description of the 
+relevant files in this repository.
 
-utils.py
-Contains convenience functions and namedtuple() data holders that ease the creation of the synthetic dataset. There is a lot of talk about "significant" vs "non-significant" chemicals. This refers to chemicals that have variantion across replicates that is statistically significant versus ones that do not.
 
-norm_eics.py
-Helper script used just once to normalize the maximum peak of the compound to 1.0. Loads in eics from the data/eics-cpy2.pickle file and saves them back to that file.
+## `./main.py`
 
-mults.py
-Auxiliary script that creates the multipliers that will be combined with the eics to create the master replicate. By default makes 3000 non-significant multiplier sets and 1000 significant multiplier sets. Saves the results to multipliers.csv
+This is the driver script for creating the nine versions of the simulated dataset. The 
+script is not run with any commandline arguments. It has been rng seeded to ensure the same
+data is created each time. Revolves around the `ConditionManager` class defined in 
+`condition_manager.py`, which takes the `ClassEic`'s, significance mapper, percent missing level,
+and noise level, to output the full 20 replicates for each condition and outputs the respective `.mzML` 
+files. Requires the following packages to run:
+    + `matplotlib`
+    + `numpy`
+    + `psims`
 
-vimms/
-The specially modified version of the ViMMS software package used in this data generation protocol to pull the EIC waveforms out of memory directly.
+## `utils.py`
 
-peak_graphs
-Contains the individual peak waveforms that were analyzed during the course of creating the initial compound list for the simulated dataset.
+Contains convenience functions and `namedtuple()` data holders that ease the creation of the synthetic 
+dataset. There is a lot of talk about "significant" vs "non-significant" chemicals. This refers to 
+chemicals that have variantion across replicates that is statistically significant versus ones that do not.
+
+## `norm_eics.py`
+Helper script used just once to normalize the maximum peak of the compound to 1.0. Loads in eics from the `data/eics-cpy2.pickle`
+file and saves them back to that file.
+
+
+## `mults.py`
+
+Auxiliary script that creates the multipliers that will be combined with the eics to create the master replicate. 
+By default makes 3000 non-significant multiplier sets and 1000 significant multiplier sets. Saves the results to
+`multipliers.csv`
+
+
+## `vimms/`
+
+The specially modified version of the ViMMS software package used in this data generation protocol to pull the EIC
+waveforms out of memory directly.
+
+
+## `peak_graphs`
+
+Contains the individual peak waveforms that were analyzed during the course of creating the initial compound
+list for the simulated dataset.
+
+
+
+## `vimms_data_generation`
 
-vimms_data_generation
 This directory contains a number of scripts critical to creating the original EICs and idealizing them.
-- 
GitLab