Gene Normalizer 0.3.0-dev2#

The Gene Normalizer provides tools for resolving ambiguous human gene references to consistently-structured, normalized terms. For gene concepts extracted from NCBI Gene, Ensembl, and HGNC, it designates a CURIE, and provides additional metadata like current and previously-used symbols, aliases, database cross-references and associations, and coordinates.

A public REST instance of the service is available for programmatic queries:

>>> import requests
>>> result = requests.get("https://normalize.cancervariants.org/gene/normalize?q=braf").json()
>>> result["normalized_id"]
'hgnc:1097'
>>> result["gene"]["aliases"]
['B-raf', 'NS7', 'B-RAF1', 'BRAF-1', 'BRAF1', 'RAFB1']

The Gene Normalizer can also be installed locally as a Python package for fast access:

>>> from gene.query import QueryHandler
>>> from gene.database import create_db
>>> q = QueryHandler(create_db())
>>> result = q.normalize("BRAF")
>>> result.normalized_id
'hgnc:1097'
>>> result.gene.aliases
['NS7', 'RAFB1', 'B-raf', 'BRAF-1', 'BRAF1', 'B-RAF1']

The Gene Normalizer was created to support the Knowledgebase Integration Project of the Variant Interpretation for Cancer Consortium (VICC). It is developed primarily by the Wagner Lab. Full source code is available on GitHub.