PostgreSQL#
The Gene Normalizer can store and retrieve gene records from a PostgreSQL database. See the “Getting Started” section of the PostgreSQL documentation for basic installation instructions.
Note
See the PostgreSQL handler API reference for information on programmatic access.
Local setup#
To populate the Gene Normalizer, a connection must be established to an existing PostgreSQL database, so one must be created manually when performing Gene Normalizer setup. Most PostgreSQL distributions include the createdb utility for this purpose. For example, to create a database named gene_normalizer
in a local database listening on port 5432 using the PostgreSQL user named postgres
, run the following shell command:
createdb -h localhost -p 5432 -U postgres gene_normalizer
Once created, set the environment variable GENE_NORM_DB_URL
to a connection description for that database. The following command provides a connection to a database named gene_normalizer
in a local PostgreSQL instance, using port 5432, under the username postgres
with no required password. See the PostgreSQL connection string documentation for more information.
export GENE_NORM_DB_URL=postgres://postgres@localhost:5432/gene_normalizer
Load from remote source#
The Gene Normalizer’s PostgreSQL class provides the gene_norm_update_remote
shell command to refresh its data directly from a remotely-stored SQL dump, instead of acquiring, transforming, and loading source data. This enables data loading on the order of seconds rather than hours. See the command description at gene_norm_update_remote --help
for more information.
By default, this command will fetch the latest data dump provided by the VICC. Alternative URLs can be set with the --data_url
option:
gene_norm_update_remote --data_url=https://vicc-normalizers.s3.us-east-2.amazonaws.com/gene_normalization/postgresql/gene_norm_20230322163523.sql.tar.gz
Create SQL dump from database#
The Gene Normalizer’s PostgreSQL class also provides the gene_norm_dump
shell command to create a SQL dump of current data into a file. This command will create a file named gene_norm_YYYYMMDDHHmmss.sql
in the current directory; the -o
option can be used to specify an alternate location, like so:
gene_norm_dump -o ~/.gene_data/
See gene_norm_dump --help
for more information.