gene.etl.update#

Provide functions for updating normalizer data

gene.etl.update.delete_normalized(database, silent=True)[source]#

Delete normalized concepts.

Parameters:
  • database (AbstractDatabase) – DB instance

  • silent (bool) – if True, suppress console output

Return type:

None

gene.etl.update.delete_source(source, db, silent=True)[source]#

Delete all data for an individual source.

Parameters:
  • source (SourceName) – name of source to delete data for

  • db (AbstractDatabase) – database instance

  • silent (bool) – if True, suppress console output

Return type:

float

Returns:

time spent deleting source

gene.etl.update.load_source(source, db, use_existing, silent=True)[source]#

Load data for an individual source.

Parameters:
  • source (SourceName) – name of source to load data for

  • db (AbstractDatabase) – database instance

  • use_existing (bool) – if True, use latest available version of local data

  • silent (bool) – if True, suppress console output

Return type:

tuple[float, set[str]]

Returns:

time spent loading data, and set of processed IDs from that source

gene.etl.update.update_all_and_normalize(db, use_existing, silent=True)[source]#

Update all sources as well as normalized records.

For example, to completely refresh all Gene Normalizer data:

>>> from gene.database import create_db
>>> from gene.etl.update import update_all_and_normalize
>>> db = create_db()
>>> update_all_and_normalize(db, False)
Parameters:
  • db (AbstractDatabase) – database instance

  • use_existing (bool) – if True, use latest local copy of data

  • silent (bool) – if True, suppress console output

Return type:

None

gene.etl.update.update_all_sources(db, use_existing, silent=True)[source]#

Refresh data for all gene record sources.

Parameters:
  • db (AbstractDatabase) – database instance

  • use_existing (bool) – if True, use latest available local data for all sources

  • silent (bool) – if True, suppress console output

Return type:

set[str]

Returns:

IDs processed from all sources

gene.etl.update.update_normalized(db, processed_ids, silent=True)[source]#

Delete existing and update merged normalized records.

Parameters:
  • db (AbstractDatabase) – database instance

  • processed_ids (Optional[set[str]]) – IDs to form normalized records from. Provide if available to cut down on some potentially slow database calls. If unavailable, this method will fetch all known IDs directly.

  • silent (bool) – if True, suppress console output

Return type:

None

gene.etl.update.update_source(source, db, use_existing, silent=True)[source]#

Refresh data for an individual gene data source.

For example, to completely refresh Ensembl data:

>>> from gene.schemas import SourceName
>>> from gene.database import create_db
>>> from gene.etl.update import update_source
>>> db = create_db()
>>> processed_ids = update_source(SourceName.ENSEMBL, db)
Parameters:
  • source (SourceName) – name of source to update

  • db (AbstractDatabase) – database instance

  • use_existing (bool) – if True, use latest available local data

  • silent (bool) – if True, suppress console output

Return type:

set[str]

Returns:

IDs for records created from source