Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
C
chipathlon
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Deploy
Releases
Model registry
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Holland Computing Center
chipathlon
Commits
da85b698
Commit
da85b698
authored
7 years ago
by
aknecht2
Browse files
Options
Downloads
Patches
Plain Diff
Updated db auto-doc and rst.
parent
581a468e
No related branches found
Branches containing commit
No related tags found
1 merge request
!25
Resolve "Method Auto Doc"
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
chipathlon/db.py
+49
-25
49 additions, 25 deletions
chipathlon/db.py
doc/source/db.rst
+3
-0
3 additions, 0 deletions
doc/source/db.rst
with
52 additions
and
25 deletions
chipathlon/db.py
+
49
−
25
View file @
da85b698
...
...
@@ -12,18 +12,23 @@ import hashlib
class
MongoDB
(
object
):
"""
:param host: The host address of the MongoDB database.
:type host: str
:param username: The username of the account for the MongoDB database.
:type username: str
:param password: The password for the user.
:type password: str
:param debug: A flag for printing additional messages.
:type debug: bool
This class is used to manage all interactions with the encode metadata.
The metadata can be very unruly and difficult to deal with. There
are several helper functions within this class to make some database
operations much easier.
"""
def
__init__
(
self
,
host
,
username
,
password
,
debug
=
False
):
"""
:param host: The host address of the MongoDB database.
:type host: str
:param username: The username of the account for the MongoDB database.
:type username: str
:param password: The password for the user.
:type password: str
:param debug: If true print out debug messages
:type debug: bool
"""
self
.
debug
=
debug
self
.
host
=
host
self
.
username
=
username
...
...
@@ -48,6 +53,12 @@ class MongoDB(object):
:type key: Any hashable
:param data: The data to add to the cache.
:type data: Object
Adds a data result to the internal cache. This is used to speed up
requests that are identical. We may have multiple runs that use
identical control / signal files but change around the alignment or
peak calling tools. In these cases we don
'
t want to request info
from the database multiple times for the same data.
"""
if
function
not
in
self
.
cache
:
self
.
cache
[
function
]
=
{}
...
...
@@ -60,6 +71,8 @@ class MongoDB(object):
:type function: str
:param key: The key to get from the cache.
:type key: Any hashable
Gets a data item from the internal cache.
"""
if
function
in
self
.
cache
:
if
key
in
self
.
cache
[
function
]:
...
...
@@ -69,9 +82,9 @@ class MongoDB(object):
def
delete_result
(
self
,
result
,
genome
):
"""
:param result: The result to delete
:type result: :py:class:~chipathlon.result.Result
:type result: :py:class:
`
~chipathlon.result.Result
`
:param genome: The genome to find information from.
:type genome: :py:
meth:
~chipathlon.genome.Genome
:type genome: :py:
class:`
~chipathlon.genome.Genome
`
Deletes a result and it
'
s corresponding gridfs entry.
"""
...
...
@@ -114,11 +127,13 @@ class MongoDB(object):
def
result_exists
(
self
,
result
,
genome
):
"""
:param result: The result to check.
:type result: :py:meth:~chipathlon.result.Result
:type result: :py:meth:
`
~chipathlon.result.Result
`
:param genome: The genome to find information from.
:type genome: :py:meth:~chipathlon.genome.Genome
:type genome: :py:meth:
`
~chipathlon.genome.Genome
`
Check if a result exists.
Check if a result exists in the database. The genome parameter
is required since some files have been aligned or use individual
chromsome fasta or size files for peak calling.
"""
try
:
cursor
=
self
.
db
.
results
.
find
(
self
.
_get_result_query
(
result
,
genome
))
...
...
@@ -130,11 +145,12 @@ class MongoDB(object):
def
get_result_id
(
self
,
result
,
genome
):
"""
:param result: The result to check.
:type result: :py:meth:~chipathlon.result.Result
:type result: :py:meth:
`
~chipathlon.result.Result
`
:param genome: The genome to find information from.
:type genome: :py:meth:~chipathlon.genome.Genome
:type genome: :py:meth:`~chipathlon.genome.Genome`
:returns: The id found or None
Get the id of a result.
Get the id of a result
in the database
.
"""
try
:
cursor
=
self
.
db
.
results
.
find
(
self
.
_get_result_query
(
result
,
genome
))
...
...
@@ -177,8 +193,11 @@ class MongoDB(object):
:param gfs_attributes: Additional metadata to store in gridfs.
:type gfs_attributes: dict
Saves a result file into mongodb and also creates the corresponding
gridfs file.
Saves a result entry into MongodDB and uploads the file into gridfs.
The only difference between additional_data and gfs_attributes is the
location the metadata is stored. Both just store key value pairs of
information, the additional_data information is stored in the result
entry, the gfs_attributes information is stored in gridfs.
"""
# Make sure output_file exists
if
os
.
path
.
isfile
(
output_file
):
...
...
@@ -218,6 +237,7 @@ class MongoDB(object):
"""
:param sample_accession: The accession number to check.
:type sample_accession: str
:returns: Whether or not the sample is valid.
Ensures that a sample with the accession specified actually exists.
"""
...
...
@@ -235,6 +255,7 @@ class MongoDB(object):
"""
:param experiment_accession: The accession number to check.
:type experiment_accession: str
:returns: Whether or not the experiment is valid
Ensures that an experiment with the accession specified actually exists.
"""
...
...
@@ -252,15 +273,15 @@ class MongoDB(object):
def
fetch_from_gridfs
(
self
,
gridfs_id
,
filename
,
checkmd5
=
True
):
"""
:param gridfs_id: GridFS _id of file to get.
:type gridfs_id: bson.objectid.ObjectId
:type gridfs_id:
:py:class:`
bson.objectid.ObjectId
`
:param filename: Filename to save file to.
:type filename: str
:param checkmd5: Whether or not to validate the md5 of the result
:type checkmd5: bool
Fetch the file with the corresponding id and save it under the
specified
'
filename
'
. If checkmd5 is specified, validate that the
saved
file has a correct md5 value.
specified
'
filename
'
. If checkmd5 is specified, validate that the
saved
file has a correct md5 value.
"""
try
:
gridfs_file
=
self
.
gfs
.
get
(
gridfs_id
)
...
...
@@ -298,10 +319,13 @@ class MongoDB(object):
"""
:param accession: The accession number of the target sample
:type accession: string
:param file_type: The file type of the target sample
should be [fastq|bam]
:param file_type: The file type of the target sample
.
:type file_type: string
Gets the associated sample based on accession number and file_type
Gets the associated sample based on accession number and file_type.
For loading input files for workflows the file_type should be fastq
or bam. Other file types can be specified for loading additional files
saved in the experiment metadata.
"""
valid
=
True
msg
=
""
...
...
This diff is collapsed.
Click to expand it.
doc/source/db.rst
+
3
−
0
View file @
da85b698
MongoDB
==============
MongoDB Class
^^^^^^^^^^^^^^
.. autoclass:: chipathlon.db.MongoDB
:members:
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment