gene.etl.ensembl#

Defines the Ensembl ETL methods.

class gene.etl.ensembl.Ensembl(database, seqrepo_dir=SEQREPO_ROOT_DIR, data_path=None, silent=True)[source]#

ETL the Ensembl source into the normalized database.

__init__(database, seqrepo_dir=SEQREPO_ROOT_DIR, data_path=None, silent=True)[source]#

Instantiate Base class.

Parameters:
  • database (AbstractDatabase) – database instance

  • seqrepo_dir (Path) – Path to seqrepo directory

  • data_path (Optional[Path]) – path to app data directory

  • silent (bool) – if True, don’t print ETL result to console

get_seqrepo(seqrepo_dir)[source]#

Return SeqRepo instance if seqrepo_dir exists.

Parameters:

seqrepo_dir (Path) – Path to seqrepo directory

Return type:

SeqRepo

Returns:

SeqRepo instance

perform_etl(use_existing=False)[source]#

Public-facing method to begin ETL procedures on given data. Returned concept IDs can be passed to Merge method for computing merged concepts.

Parameters:

use_existing (bool) – if True, don’t try to retrieve latest source data

Return type:

list[str]

Returns:

list of concept IDs which were successfully processed and uploaded.