gene.database.dynamodb#
Provide DynamoDB client.
- class gene.database.dynamodb.DynamoDbDatabase(db_url=None, **db_args)[source]#
Database class employing DynamoDB.
- __init__(db_url=None, **db_args)[source]#
Initialize Database class.
- Parameters:
db_url (str) – URL endpoint for DynamoDB source
- Keyword Arguments:
region_name: AWS region (defaults to “us-east-2”)
- Raises:
DatabaseInitializationException – if initial setup fails
- add_merged_record(record)[source]#
Add merged record to database.
- Parameters:
record (
dict) – merged record to add- Return type:
None
- add_record(record, src_name)[source]#
Add new record to database.
- Parameters:
record (Dict) – record to upload
src_name (SourceName) – name of source for record
- Return type:
None
- add_source_metadata(src_name, metadata)[source]#
Add new source metadata entry.
- Parameters:
src_name (
SourceName) – name of sourcedata – known source attributes
- Raises:
DatabaseWriteException – if write fails
- Return type:
None
- check_schema_initialized()[source]#
Check if database schema is properly initialized.
- Return type:
bool- Returns:
True if DB appears to be fully initialized, False otherwise
- check_tables_populated()[source]#
Perform rudimentary checks to see if tables are populated.
Emphasis is on rudimentary – if some rogueish element has deleted half of the gene aliases, this method won’t pick it up. It just wants to see if a few critical tables have at least a small number of records.
- Return type:
bool- Returns:
True if queries successful, false if DB appears empty
- close_connection()[source]#
Perform any manual connection closure procedures if necessary.
- Return type:
None
- complete_write_transaction()[source]#
Conclude transaction or batch writing if relevant.
- Return type:
None
- delete_normalized_concepts()[source]#
Remove merged records from the database. Use when performing a new update of normalized data.
- Raises:
DatabaseReadException – if DB client requires separate read calls and encounters a failure in the process
DatabaseWriteException – if deletion call fails
- Return type:
None
- delete_source(src_name)[source]#
Delete all data for a source. Use when updating source data.
- Parameters:
src_name (
SourceName) – name of source to delete- Raises:
DatabaseReadException – if DB client requires separate read calls and encounters a failure in the process
DatabaseWriteException – if deletion call fails
- Return type:
None
- drop_db()[source]#
Delete all tables from database. Requires manual confirmation.
- Raises:
DatabaseWriteException – if called in a protected setting with confirmation silenced.
- Return type:
None
- export_db(output_directory)[source]#
Dump DB to specified location.
- Parameters:
output_directory (
Path) – path to directory to save DB dump in- Return type:
None- Returns:
Nothing, but saves dump to gzip file named gene_norm_<date and time>.ndjson.gz
- Raises:
ValueError – if output directory isn’t a directory or doesn’t exist
- get_all_concept_ids()[source]#
Retrieve concept IDs for use in generating normalized records.
- Return type:
set[str]- Returns:
List of concept IDs as strings.
- get_all_records(record_type)[source]#
Retrieve all source or normalized records. Either return all source records, or all records that qualify as “normalized” (i.e., merged groups + source records that are otherwise ungrouped).
For example,
>>> from gene.database import create_db >>> from gene.schemas import RecordType >>> db = create_db() >>> for record in db.get_all_records(RecordType.MERGER): >>> pass # do something
- Parameters:
record_type (
RecordType) – type of result to return- Return type:
Generator[dict,None,None]- Returns:
Generator that lazily provides records as they are retrieved
- get_record_by_id(concept_id, case_sensitive=True, merge=False)[source]#
Fetch record corresponding to provided concept ID
- Parameters:
concept_id (str) – concept ID for gene record
case_sensitive (bool) – if true, performs exact lookup, which is more efficient. Otherwise, performs filter operation, which doesn’t require correct casing.
merge (bool) – if true, look for merged record; look for identity record otherwise.
- Return type:
Optional[dict]- Returns:
complete gene record, if match is found; None otherwise
- get_refs_by_type(search_term, ref_type)[source]#
Retrieve concept IDs for records matching the user’s query. Other methods are responsible for actually retrieving full records.
- Parameters:
search_term (
str) – string to match againstref_type (
RefType) – type of match to look for.
- Return type:
list[str]- Returns:
list of associated concept IDs. Empty if lookup fails.
- get_source_metadata(src_name)[source]#
Get license, versioning, data lookup, etc information for a source.
- Parameters:
src_name (
str|SourceName) – name of the source to get data for- Return type:
dict
- list_tables()[source]#
Return names of tables in database.
- Return type:
list[str]- Returns:
Table names in DynamoDB
- load_from_remote(url=None)[source]#
Load DB from remote dump. Not available for DynamoDB database backend.
- Parameters:
url (
Optional[str]) – remote location to retrieve gzipped dump file from- Return type:
None
- update_merge_ref(concept_id, merge_ref)[source]#
Update the merged record reference of an individual record to a new value.
- Parameters:
concept_id (
str) – record to updatemerge_ref (
Any) – new ref value
- Raises:
DatabaseWriteException – if attempting to update non-existent record
- Return type:
None