gene.database.database#
Provide abstract Database class and relevant tools for database initialization.
- class gene.database.database.AbstractDatabase(db_url=None, **db_args)[source]#
Define the database interface. This class should never be called directly by a user, but should be used as the parent class for all concrete database implementations.
- abstract __init__(db_url=None, **db_args)[source]#
Initialize database instance.
Generally, implementing classes should be able to construct a connection by something like a libpq URL. Any additional arguments or DB-specific parameters can be passed as keywords.
- Parameters:
db_url (
Optional
[str
]) – address/connection description for databasedb_args – any DB implementation-specific parameters
- Raises:
DatabaseInitializationException – if initial setup fails
- abstract add_merged_record(record)[source]#
Add merged record to database.
- Parameters:
record (
Dict
) – merged record to add- Return type:
None
- abstract add_record(record, src_name)[source]#
Add new record to database.
- Parameters:
record (
Dict
) – record to uploadsrc_name (
SourceName
) – name of source for record.
- Return type:
None
- abstract add_source_metadata(src_name, data)[source]#
Add new source metadata entry.
- Parameters:
src_name (
SourceName
) – name of sourcedata (
SourceMeta
) – known source attributes
- Raises:
DatabaseWriteException – if write fails
- Return type:
None
- abstract check_schema_initialized()[source]#
Check if database schema is properly initialized.
- Return type:
bool
- Returns:
True if DB appears to be fully initialized, False otherwise
- abstract check_tables_populated()[source]#
Perform rudimentary checks to see if tables are populated.
Emphasis is on rudimentary – if some rogueish element has deleted half of the gene aliases, this method won’t pick it up. It just wants to see if a few critical tables have at least a small number of records.
- Return type:
bool
- Returns:
True if queries successful, false if DB appears empty
- abstract close_connection()[source]#
Perform any manual connection closure procedures if necessary.
- Return type:
None
- abstract complete_write_transaction()[source]#
Conclude transaction or batch writing if relevant.
- Return type:
None
- abstract delete_normalized_concepts()[source]#
Remove merged records from the database. Use when performing a new update of normalized data.
- Raises:
DatabaseReadException – if DB client requires separate read calls and encounters a failure in the process
DatabaseWriteException – if deletion call fails
- Return type:
None
- abstract delete_source(src_name)[source]#
Delete all data for a source. Use when updating source data.
- Parameters:
src_name (
SourceName
) – name of source to delete- Raises:
DatabaseReadException – if DB client requires separate read calls and encounters a failure in the process
DatabaseWriteException – if deletion call fails
- Return type:
None
- abstract drop_db()[source]#
Initiate total teardown of DB. Useful for quickly resetting the entirety of the data. Requires manual confirmation.
- Raises:
DatabaseWriteException – if called in a protected setting with confirmation silenced.
- Return type:
None
- abstract export_db(export_location)[source]#
Dump DB to specified location.
- Parameters:
export_location (
Path
) – path to save DB dump at- Raise:
NotImplementedError if not supported by DB
- Return type:
None
- abstract get_all_concept_ids()[source]#
Retrieve all available concept IDs for use in generating normalized records.
- Return type:
Set
[str
]- Returns:
List of concept IDs as strings.
- abstract get_all_records(record_type)[source]#
Retrieve all source or normalized records. Either return all source records, or all records that qualify as “normalized” (i.e., merged groups + source records that are otherwise ungrouped).
For example,
>>> from gene.database import create_db >>> from gene.schemas import RecordType >>> db = create_db() >>> for record in db.get_all_records(RecordType.MERGER): >>> pass # do something
- Parameters:
record_type (
RecordType
) – type of result to return- Return type:
Generator
[Dict
,None
,None
]- Returns:
Generator that lazily provides records as they are retrieved
- abstract get_record_by_id(concept_id, case_sensitive=True, merge=False)[source]#
Fetch record corresponding to provided concept ID
- Parameters:
concept_id (
str
) – concept ID for gene recordcase_sensitive (
bool
) – if true, performs exact lookup, which may be quicker. Otherwise, performs filter operation, which doesn’t require correct casing.merge (
bool
) – if true, look for merged record; look for identity record otherwise.
- Return type:
Optional
[Dict
]- Returns:
complete gene record, if match is found; None otherwise
- abstract get_refs_by_type(search_term, ref_type)[source]#
Retrieve concept IDs for records matching the user’s query. Other methods are responsible for actually retrieving full records.
- Parameters:
search_term (
str
) – string to match againstref_type (
RefType
) – type of match to look for.
- Return type:
List
[str
]- Returns:
list of associated concept IDs. Empty if lookup fails.
- abstract get_source_metadata(src_name)[source]#
Get license, versioning, data lookup, etc information for a source.
- Parameters:
src_name (
Union
[str
,SourceName
]) – name of the source to get data for- Return type:
Dict
- abstract initialize_db()[source]#
Perform all necessary parts of database setup. Should be tolerant of existing content – ie, this method is also responsible for checking whether the DB is already set up.
- Raises:
DatabaseInitializationException – if initialization fails
- Return type:
None
- abstract list_tables()[source]#
Return names of tables in database.
- Return type:
List
[str
]- Returns:
Table names in database
- abstract load_from_remote(url=None)[source]#
Load DB from remote dump. Warning: Deletes all existing data.
- Parameters:
url (
Optional
[str
]) – remote location to retrieve gzipped dump file from- Raise:
NotImplementedError if not supported by DB
- Return type:
None
- abstract update_merge_ref(concept_id, merge_ref)[source]#
Update the merged record reference of an individual record to a new value.
- Parameters:
concept_id (
str
) – record to updatemerge_ref (
Any
) – new ref value
- Raises:
DatabaseWriteException – if attempting to update non-existent record
- Return type:
None
- class gene.database.database.AwsEnvName(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
AWS environment name that is being used
- exception gene.database.database.DatabaseException[source]#
Create custom class for handling database exceptions
- exception gene.database.database.DatabaseInitializationException[source]#
Create custom exception for errors during DB connection initialization.
- exception gene.database.database.DatabaseReadException[source]#
Create custom exception for lookup/read errors
- exception gene.database.database.DatabaseWriteException[source]#
Create custom exception for write errors
- gene.database.database.confirm_aws_db_use(env_name)[source]#
Check to ensure that AWS instance should actually be used.
- Return type:
None
- gene.database.database.create_db(db_url=None, aws_instance=False)[source]#
Database factory method. Checks environment variables and provided parameters and creates a DB instance.
Generally prefers to return a DynamoDB instance, unless all DDB-relevant environment variables are unset and a
libpq
-compliant URI is assigned todb_url
. See the Usage section of the documentation for details.Some examples:
>>> from gene.database import create_db >>> default_db = create_db() # by default, creates DynamoDB connection on port 8000 >>> >>> postgres_url = "postgresql://postgres@localhost:5432/gene_normalizer" >>> pg_db = create_db(postgres_url) # creates Postgres connection at port 5432 >>> >>> import os >>> os.environ["GENE_NORM_DB_URL"] = "http://localhost:8001" >>> local_db = create_db() # creates DynamoDB connection on port 8001 >>> >>> os.environ["GENE_NORM_ENV"] = "Prod" >>> prod_db = create_db() # creates connection to AWS cloud DynamoDB instance, >>> # overruling `GENE_NORM_DB_URL` variable setting
Precedence is handled for connection settings like so:
if environment variable
GENE_NORM_ENV
is set to a value, or if theaws_instance
method argument is True, try to create a cloud DynamoDB connectionif the
db_url
method argument is given a non-None value, try to create a DB connection to that address (if it looks like a PostgreSQL URL, create a PostgreSQL connection, but otherwise try DynamoDB)if the
GENE_NORM_DB_URL
environment variable is set, try to create a DB connection to that address (if it looks like a PostgreSQL URL, create a PostgreSQL connection, but otherwise try DynamoDB)otherwise, attempt a DynamoDB connection to the default URL,
http://localhost:8000
- Parameters:
db_url (
Optional
[str
]) – address to database instanceaws_instance (
bool
) – use hosted DynamoDB instance, not local DB
- Return type:
- Returns:
constructed Database instance