Annotating Genetic Variants Made Easy with GeneBe REST API

Stawinski
4 min readFeb 8, 2024

--

Introduction to the GeneBe.net API for Annotating Variants with ACMG, ClinVar, GnomAD, RefSeq, and More

If you are working with genetic variants, you have likely encountered a common challenge: annotating and interpreting variants. Whether it’s a variant in a VCF file, a JSON file, a Pandas dataframe, or a plain text string, the task can be complex. Manually checking variants in GnomAD for frequencies or examining ClinVar for existing interpretations can be time-consuming. Annotating with RefSeq or Ensembl adds another layer of complexity with tools like VEP, AnnoVar, and SnpEff, requiring substantial time and data downloads. And yet, after three months, the databases may become outdated, complicating future analyses.

Wouldn’t it be more convenient to have a single REST API for annotating multiple variants at once, providing evergreen annotations usable with VCF, Pandas, or a simple curl query? Meet GeneBe.net 🙂

GeneBe.net is a website that provides tools for interpreting variants. Moreover, it offers a free REST API designed to annotate genetic variants efficiently. You’ll receive JSON descriptions of your variants, including RefSeq and Ensembl function details, GnomAD population frequencies, ClinVar pathogenicity scores, and computational scores like SpliceAI. Additionally, GeneBe.net features one of the best automatic ACMG score implementations. Let’s see it in action:

$ curl -X 'GET'  \
'https://api.genebe.net/cloud/api-public/v1/variant?chr=6&pos=160585140&ref=T&alt=G&useEnsembl=False&genome=hg38' | jq .
{
"variants": [
{
"chr": "6",
"pos": 160585140,
"ref": "T",
"alt": "G",
"transcript": "NM_005577.4",
"consequences_refseq": [
{
"aa_ref": "T",
"aa_alt": "P",
"canonical": true,
"protein_coding": true,
"consequences": [
"missense_variant"
],
"exon_rank": 26,
"exon_rank_end": 26,
"exon_count": 39,
"gene_symbol": "LPA",
"gene_hgnc_id": 6667,
"hgvs_c": "c.4195A>C",
"hgvs_p": "p.Thr1399Pro",
"transcript": "NM_005577.4",
"protein_id": "NP_005568.2",
"aa_start": 1399,
"aa_end": 1399,
"aa_length": 2040,
"cds_start": 4195,
"cds_end": 4195,
"cds_length": 6123,
"cdna_start": 4256,
"cdna_end": 4256,
"cdna_length": 6431,
"mane_select": "ENST00000316300.10"
}
],
"gene_symbol": "LPA",
"dbsnp": "41272110",
"gnomad_exomes_af": 0.13180799782276154,
"gnomad_genomes_af": 0.09891059994697571,
"gnomad_exomes_ac": 192617,
"gnomad_genomes_ac": 15053,
"gnomad_exomes_homalt": 13836,
"gnomad_genomes_homalt": 949,
"revel_score": 0.25600001215934753,
"bayesdelnoaf_score": -0.3799999952316284,
"phylop100way_score": 2.9049999713897705,
"acmg_score": -12,
"acmg_classification": "Benign",
"acmg_criteria": "BP4_Strong,BA1"
}
]
}

For handling multiple variants, querying them in one POST request is much faster than one by one. If you have a VCF file, there’s a convenient API wrapper called pygenebe that allows you to annotate the file effortlessly. Just install it using pip: pip install genebe:

# querying multiple genetic variants in batch using curl, consider using --netrc
curl -X 'POST' \
'https://api.genebe.net/cloud/api-public/v1/variants' \
-d '[{"chr":"22", "pos":28695868, "ref":"AG", "alt":"A"}, {"chr":"22", "pos":28695869, "ref":"G", "alt":"T"}]'

# annotating a VCF file
genebe annotate --input input.vcf.gz --output output.vcf.gz

For Python users, the pygenebe library provides utility functions for easy API calls and includes a helpful function for working with Pandas dataframes. Find more documentation on https://pygenebe.readthedocs.io/en/latest/ and in the repository https://github.com/pstawinski/pygenebe , in the folder examples .

## pip install genebe
>>> import genebe as gnb

# you may ask for hundreds of variants in one query
>>> input_variants = ['7-69599651-A-G']

# output as a list, with all transcripts
>>> list = gnb.annotate_variants_list(input_variants,flatten_consequences = False)
>>> list
[{'chr': '7', 'pos': 69599651, 'ref': 'A', 'alt': 'G', (...) }]

# output as a pandas dataframe, flat
>>> df = gnb.annotate_variants_list_to_dataframe(input_variants, flatten_consequences=True)
>>> df
chr pos ref alt transcript gene_symbol dbsnp ... acmg_classification acmg_criteria clinvar_disease clinvar_classification gene_hgnc_id hgvs_c consequences
0 7 69599651 A G NM_015570.4 AUTS2 3735260 ... Benign BP4_Strong,BP6_Moderate,BA1 not provided Benign 14262 c.-3A>G 5_prime_UTR_variant

If you have a pandas dataframe with columns chr, pos, ref and alt you can easily annotate it doing just:

>>> import genebe as gnb
>>> df = pd.DataFrame({'chr': ['6', '22'], 'pos': [160585140, 28695868], 'ref': ['T', 'AG'], 'alt': ['G', 'A']})
>>> annotated_df = gnb.annotate_dataframe_variants(df, genome='hg38',use_ensembl=False,use_refseq=True, genome='hg38', flatten_consequences=True)

This is probably the easiest way to annotate variants ever.

GeneBe has some more interesting utils. If your variants are represented in HGVS and you want to convert it to genomic coordinates: it very easy using the API. Or the online HGVS converter as well. Same if you need a liftover, GeneBe API has your needs covered:

>>> import genebe as gnb
>>> input_hgvs = ['NM_000277.2:c.1A>G']
>>> parsed_variants = gnb.parse_hgvs(input_hgvs)
>>> parsed_variants
['12-102917130-T-C']

GeneBe primarily works with the Human Genome in version hg38. If you query it for hg19 or T2T genomes, it will automatically perform a liftover in the background.

The official API documentation is available in OpenApi format at https://api.genebe.net/cloud/gb-api-doc/swagger-ui/index.html , so you can easily create API wrapper for your favorite language using any OpenApi codegen library.

Limitations

GeneBe is available solely for research purposes, with limitations on the number of queries from one IP address. While it allows for thousands of queries daily, if you need tens of thousands, you must create an API key. It’s easy: just create an account in https://genebe.net and request an API key on your profile page. For higher query limits, reach out to the GeneBe maintainers.

--

--