gene.etl.base#
A base class for extraction, transformation, and loading of data.
- class gene.etl.base.Base(database, seqrepo_dir=SEQREPO_ROOT_DIR, data_path=None, silent=True)[source]#
The ETL base class.
- __init__(database, seqrepo_dir=SEQREPO_ROOT_DIR, data_path=None, silent=True)[source]#
Instantiate Base class.
- Parameters:
database (
AbstractDatabase
) – database instanceseqrepo_dir (
Path
) – Path to seqrepo directorydata_path (
Optional
[Path
]) – path to app data directorysilent (
bool
) – if True, don’t print ETL result to console
- get_seqrepo(seqrepo_dir)[source]#
Return SeqRepo instance if seqrepo_dir exists.
- Parameters:
seqrepo_dir (
Path
) – Path to seqrepo directory- Return type:
SeqRepo
- Returns:
SeqRepo instance
- perform_etl(use_existing=False)[source]#
Public-facing method to begin ETL procedures on given data. Returned concept IDs can be passed to Merge method for computing merged concepts.
- Parameters:
use_existing (
bool
) – if True, don’t try to retrieve latest source data- Return type:
List
[str
]- Returns:
list of concept IDs which were successfully processed and uploaded.