gene.database.postgresql#
Provide PostgreSQL client.
- class gene.database.postgresql.PostgresDatabase(db_url=None, **db_args)[source]#
Database class employing PostgreSQL.
- __init__(db_url=None, **db_args)[source]#
Initialize Postgres connection.
>>> from gene.database.postgresql import PostgresDatabase >>> db = PostgresDatabase( >>> user="postgres", >>> password="matthew_cannon2", >>> db_name="gene_normalizer" >>> )
- Parameters:
db_url (
Optional
[str
]) – libpq compliant database connection URI- Keyword Arguments:
user: Postgres username
password: Postgres password (optional or blank if unneeded)
db_name: name of database to connect to
- Raises:
DatabaseInitializationException – if initial setup fails
- add_merged_record(record)[source]#
Add merged record to database.
- Parameters:
record (
Dict
) – merged record to add- Return type:
None
- add_record(record, src_name)[source]#
Add new record to database.
- Parameters:
record (
Dict
) – record to uploadsrc_name (
SourceName
) – name of source for record. Not used by PostgreSQL instance.
- Return type:
None
- add_source_metadata(src_name, meta)[source]#
Add new source metadata entry.
- Parameters:
src_name (
SourceName
) – name of sourcemeta (
SourceMeta
) – known source attributes
- Raises:
DatabaseWriteException – if write fails
- Return type:
None
- check_schema_initialized()[source]#
Check if database schema is properly initialized.
- Return type:
bool
- Returns:
True if DB appears to be fully initialized, False otherwise
- check_tables_populated()[source]#
Perform rudimentary checks to see if tables are populated.
Emphasis is on rudimentary – if some rogueish element has deleted half of the gene aliases, this method won’t pick it up. It just wants to see if a few critical tables have at least a small number of records.
- Return type:
bool
- Returns:
True if queries successful, false if DB appears empty
- close_connection()[source]#
Perform any manual connection closure procedures if necessary.
- Return type:
None
- complete_write_transaction()[source]#
Conclude transaction or batch writing if relevant.
- Return type:
None
- delete_normalized_concepts()[source]#
Remove merged records from the database. Use when performing a new update of normalized data.
It would be faster to drop the entire table and do a cascading delete onto the merge_ref column in gene_concepts, but that requires an exclusive access lock on the DB, which can be annoying (ie you couldn’t have multiple processes accessing it, or PgAdmin, etc…). Instead, we’ll take down each part of the merge_ref
- Raises:
DatabaseReadException – if DB client requires separate read calls and encounters a failure in the process
DatabaseWriteException – if deletion call fails
- Return type:
None
- delete_source(src_name)[source]#
Delete all data for a source. Use when updating source data.
All of the foreign key relations make deletes extremely slow, so this method drops and then re-adds them once deletes are finished. This makes it a little brittle, and it’d be nice to revisit in the future to perform as a single atomic transaction.
Refreshing the materialized view at the end might be redundant, because this method will almost always be called right before more data is written, but it’s probably necessary just in case that doesn’t happen.
- Parameters:
src_name (
SourceName
) – name of source to delete- Raises:
DatabaseWriteException – if deletion call fails
- Return type:
None
- drop_db()[source]#
Perform complete teardown of DB. Useful for quickly resetting all data or reconstructing after apparent schema error. If in a protected environment, require confirmation.
- Raises:
DatabaseWriteException – if called in a protected setting with confirmation silenced.
- Return type:
None
- export_db(output_directory)[source]#
Dump DB to specified location.
- Parameters:
export_location – path to directory to save DB dump in
- Return type:
None
- Returns:
Nothing, but saves results of pg_dump to file named gene_norm_<date and time>.sql
- Raises:
ValueError – if output directory isn’t a directory or doesn’t exist
DatabaseException – if psql call fails
- get_all_concept_ids()[source]#
Retrieve concept IDs for use in generating normalized records.
- Return type:
Set
[str
]- Returns:
Set of concept IDs as strings.
- get_all_records(record_type)[source]#
Retrieve all source or normalized records. Either return all source records, or all records that qualify as “normalized” (i.e., merged groups + source records that are otherwise ungrouped).
For example,
>>> from gene.database import create_db >>> from gene.schemas import RecordType >>> db = create_db() >>> for record in db.get_all_records(RecordType.MERGER): >>> pass # do something
Unlike DynamoDB, merged records are stored in a separate table from source records. As a result, when fetching all normalized records, merged records are return first, and iteration continues with all source records that don’t belong to a normalized concept group.
- Parameters:
record_type (
RecordType
) – type of result to return- Return type:
Generator
[Dict
,None
,None
]- Returns:
Generator that lazily provides records as they are retrieved
- get_record_by_id(concept_id, case_sensitive=True, merge=False)[source]#
Fetch record corresponding to provided concept ID :param str concept_id: concept ID for gene record :param bool case_sensitive: :param bool merge: if true, look for merged record; look for identity record otherwise. :rtype:
Optional
[Dict
] :return: complete gene record, if match is found; None otherwise
- get_refs_by_type(search_term, ref_type)[source]#
Retrieve concept IDs for records matching the user’s query. Other methods are responsible for actually retrieving full records.
- Parameters:
search_term (
str
) – string to match againstref_type (
RefType
) – type of match to look for.
- Return type:
List
[str
]- Returns:
list of associated concept IDs. Empty if lookup fails.
- get_source_metadata(src_name)[source]#
Get license, versioning, data lookup, etc information for a source.
- Parameters:
src_name (
SourceName
) – name of the source to get data for- Return type:
Dict
- initialize_db()[source]#
Check if DB is set up. If not, create tables/indexes/views.
- Return type:
None
- list_tables()[source]#
Return names of tables in database.
- Return type:
List
[str
]- Returns:
Table names in database
- load_from_remote(url)[source]#
Load DB from remote dump. Warning: Deletes all existing data. If not passed as an argument, will try to grab latest release from VICC S3 bucket.
- Parameters:
url (
Optional
[str
]) – location of .tar.gz file created from output of pg_dump- Raises:
DatabaseException – if unable to retrieve file from URL or if psql command fails
- Return type:
None
- update_merge_ref(concept_id, merge_ref)[source]#
Update the merged record reference of an individual record to a new value.
- Parameters:
concept_id (
str
) – record to updatemerge_ref (
Any
) – new ref value
- Raises:
DatabaseWriteException – if attempting to update non-existent record
- Return type:
None