Contributing to the Gene Normalizer#

Bug reports and feature requests#

Bugs and new feature requests can be submitted to the Gene Normalizer issue tracker on GitHub. See this StackOverflow post for tips on how to craft a helpful bug report.

Setup#

Clone the repository:

git clone https://github.com/cancervariants/gene-normalization
cd gene-normalization

Then initialize the virtual environment:

python3 -m venv venv
source venv/bin/activate
python3 -m pip install -e ".[pg,etl,tests,dev,docs]"

We use pre-commit to run conformance tests before commits. This provides checks for:

  • Code format and style

  • Added large files

  • AWS credentials

  • Private keys

Before your first commit, run:

pre-commit install

When running the web server, enable hot-reloading on new code changes:

uvicorn gene.main:app --reload

Style#

Code style is managed by Ruff, and should be checked via pre-commit hook before commits. Final QC is applied with GitHub Actions to every pull request.

Tests#

Tests are executed with pytest:

pytest

By default, tests will utilize an existing database, and won’t load any new data. For test environments where this is unavailable (e.g. in CI), the GENE_TEST environment variable can be set to ‘true’ to initialize the connected database instance with miniature versions of input data files before tests are executed:

export GENE_TEST=true

Warning

Tests executed under the GENE_TEST environment will overwrite existing data. It is recommend that a database instance separate from the main working environment is used.

Documentation#

The documentation is built with Sphinx, which is included as part of the docs dependency group. Navigate to the docs/ subdirectory and use make to build the HTML version:

cd docs
make html

See the Sphinx documentation for more information.

Figure generation#

We are experimenting with the inclusion of some static HTML figures in the documentation. For now, scripts used to generate these figures should be provided in docs/scripts/, and any external dependencies should be included in the docs dependency group.

Creating and Publishing Docker images#

Note

This section assumes you have push permissions for the DockerHub organization. It also assumes you have a local SeqRepo installed at /usr/local/share/seqrepo/2024-12-20. If you have it installed elsewhere, please update the SEQREPO_ROOT_DIR environment variable in compose-dev.yaml.

Set your DockerHub organization.

export DOCKERHUB_ORG=your-org

If you have an existing volume for DynamoDB already (gene_norm_ddb_vol) and want to load new data:

docker volume rm gene_norm_ddb_vol

Create Docker volume for DynamoDB.

docker volume create --driver local --opt type=none --opt device="$(pwd)/dynamodb_local_latest" --opt o=bind gene_norm_ddb_vol

To start the services and load DynamoDB (if necessary), from the root of the repository:

export VERSION=$(git describe --tags --abbrev=0)
docker compose -f compose-dev.yaml up --build

To tag and push the API images:

docker build --build-arg VERSION=$VERSION -t $DOCKERHUB_ORG/gene-normalizer-api:$VERSION -t $DOCKERHUB_ORG/gene-normalizer-api:latest .
docker push $DOCKERHUB_ORG/gene-normalizer-api:$VERSION
docker push $DOCKERHUB_ORG/gene-normalizer-api:latest

To archive gene_norm_ddb_vol into ./gene_norm_ddb.tar.gz:

docker run --rm \
    -v gene_norm_ddb_vol:/volume \
    -v "$(pwd)":/backup \
    alpine \
    sh -c "cd /volume && tar czf /backup/gene_norm_ddb.tar.gz ."

To tag and push the DynamoDB images, from the root of the repository:

export DATE=$(date +%F)
docker build -f Dockerfile.ddb -t $DOCKERHUB_ORG/gene-normalizer-ddb:$DATE -t $DOCKERHUB_ORG/gene-normalizer-ddb:latest .