gene.etl.update#
Provide functions for updating normalizer data
- gene.etl.update.delete_normalized(database, silent=True)[source]#
Delete normalized concepts.
- Parameters:
database (
AbstractDatabase) – DB instancesilent (
bool) – if True, suppress console output
- Return type:
None
- gene.etl.update.delete_source(source, db, silent=True)[source]#
Delete all data for an individual source.
- Parameters:
source (
SourceName) – name of source to delete data fordb (
AbstractDatabase) – database instancesilent (
bool) – if True, suppress console output
- Return type:
float- Returns:
time spent deleting source
- gene.etl.update.load_source(source, db, use_existing, silent=True)[source]#
Load data for an individual source.
- Parameters:
source (
SourceName) – name of source to load data fordb (
AbstractDatabase) – database instanceuse_existing (
bool) – if True, use latest available version of local datasilent (
bool) – if True, suppress console output
- Return type:
tuple[float,set[str]]- Returns:
time spent loading data, and set of processed IDs from that source
- gene.etl.update.update_all_and_normalize(db, use_existing, silent=True)[source]#
Update all sources as well as normalized records.
For example, to completely refresh all Gene Normalizer data:
>>> from gene.database import create_db >>> from gene.etl.update import update_all_and_normalize >>> db = create_db() >>> update_all_and_normalize(db, False)
- Parameters:
db (
AbstractDatabase) – database instanceuse_existing (
bool) – if True, use latest local copy of datasilent (
bool) – if True, suppress console output
- Return type:
None
- gene.etl.update.update_all_sources(db, use_existing, silent=True)[source]#
Refresh data for all gene record sources.
- Parameters:
db (
AbstractDatabase) – database instanceuse_existing (
bool) – if True, use latest available local data for all sourcessilent (
bool) – if True, suppress console output
- Return type:
set[str]- Returns:
IDs processed from all sources
- gene.etl.update.update_normalized(db, processed_ids, silent=True)[source]#
Delete existing and update merged normalized records.
- Parameters:
db (
AbstractDatabase) – database instanceprocessed_ids (
Optional[set[str]]) – IDs to form normalized records from. Provide if available to cut down on some potentially slow database calls. If unavailable, this method will fetch all known IDs directly.silent (
bool) – if True, suppress console output
- Return type:
None
- gene.etl.update.update_source(source, db, use_existing, silent=True)[source]#
Refresh data for an individual gene data source.
For example, to completely refresh Ensembl data:
>>> from gene.schemas import SourceName >>> from gene.database import create_db >>> from gene.etl.update import update_source >>> db = create_db() >>> processed_ids = update_source(SourceName.ENSEMBL, db)
- Parameters:
source (
SourceName) – name of source to updatedb (
AbstractDatabase) – database instanceuse_existing (
bool) – if True, use latest available local datasilent (
bool) – if True, suppress console output
- Return type:
set[str]- Returns:
IDs for records created from source