gene.etl.base#

A base class for extraction, transformation, and loading of data.

class gene.etl.base.Base(database, seqrepo_dir=SEQREPO_ROOT_DIR, data_path=None, silent=True)[source]#

The ETL base class.

__init__(database, seqrepo_dir=SEQREPO_ROOT_DIR, data_path=None, silent=True)[source]#

Instantiate Base class.

Parameters:
  • database (AbstractDatabase) – database instance

  • seqrepo_dir (Path) – Path to seqrepo directory

  • data_path (Optional[Path]) – path to app data directory

  • silent (bool) – if True, don’t print ETL result to console

get_seqrepo(seqrepo_dir)[source]#

Return SeqRepo instance if seqrepo_dir exists.

Parameters:

seqrepo_dir (Path) – Path to seqrepo directory

Return type:

SeqRepo

Returns:

SeqRepo instance

perform_etl(use_existing=False)[source]#

Public-facing method to begin ETL procedures on given data. Returned concept IDs can be passed to Merge method for computing merged concepts.

Parameters:

use_existing (bool) – if True, don’t try to retrieve latest source data

Return type:

List[str]

Returns:

list of concept IDs which were successfully processed and uploaded.