BEL and DNA methylation

Wendy Zimmerman, PhD
BioDati
Published in
3 min readMar 26, 2021
Photo by Suzanne D. Williams on Unsplash

Epigenetic changes to DNA, like DNA methylation, can alter chromatin structure and DNA accessibility, thereby altering gene expression. This important effect can play a role in growth, development, stress, obesity, and disease. DNA methylation can also be inherited. PubMed lists more than 200,000 articles dealing with DNA methylation.

So, how do we indicate DNA methylation in BEL- to record changes in a shareable, computable, reusable manner?

BEL allows us to use the modifier term var(), or variant(), to indicate a genetic change or mutation in a gene. BEL also allows us to use var() to indicate an epigenetic change, using the nomenclature approved by the Human Genome Variation Society (HGVS), you can find details for representing methylation using HGVS nomenclature here.

The Human Genome Variation Society recommends the use of |gom, (gain of methylation), |lom, (loss of methylation), or |met(methylation) to describe the methylation state at specific sites or in specific regions of the DNA.

But, chromosomes are large, and methylation does not occur just within genes, and gene regulatory sequences may be hundreds of thousands of base pairs away from the relevant genes, so use of a genomic sequence identifier may be required to cover intergenomic regions (e.g. NC_000003.11 — human chromosome 3, version 11). A reference sequence is also used to express methylation at a specific site on the DNA.

Unlike canonical HGVS variant definitions, in BEL the gene or reference sequence identifier is separated from the variant definition. It is not a single string like in HGVS format.

So we can express the abundance of methylated DNA on Chromosome 3 at location CpG 15611364 as:

g(refseq:NC_000003.11, var("g.15611364|met"))

OR we can express the hypomethylation for a specific gene, (Neurog3), without specifying the exact sites, as:

g(MGI:Neurog3, var(|lom))

Esra Karakose et. al., 2020, found that “Despite their similarities, insulinomas clearly differ from beta cells in that they are relatively hypomethylated across the majority of the 11p15.5-p15.4 target sub-region”. We can express this correlation between insulinomas and hypomethylation of CpGs in the region spanning 11p15.5-p15.41, which contains several genes, including HGNC:INS and HGNC:CDKN1C, as:

g(refseq:NC_000011.11, var(p15.5_p15.4|lom)) positiveCorrelation path(MESH:D007340!Insulinoma)

Note: var(p15.5_p15.4|lom) is not the correct HGVS format to describe this Loss of Methylation genomic variation. More on this below in the Gaps section.

Wuping Yang et. al, 2021 , report that the long noncoding RNA,“ZNF582-AS1 was downregulated in ccRCC,[clear cell renal cell carcinoma], and decreased ZNF582-AS1 expression was significantly correlated with advanced tumor stage, higher pathological stage, distant metastasis and poor prognosis. Decreased ZNF582-AS1 expression was caused by DNA methylation at the CpG islands within its promoter.” We can express these findings in BEL as:

r(HGNC:ZNF582-DT) negativeCorrelation path(DO:"clear cell renal cell carcinoma")g(HGNC:ZNF582-DT, var(|gom)) directlyDecreases r(HGNC:ZNF582-DT)

Gaps and Future Work Needed

One of the challenges that BEL is designed to address is being pragmatic about what is known and how it is shared in the biological literature. Typically, methylation sites are not presented in a way that’s easy to capture as an interpretable HGVS string which is what we recommend in the var() payload. var(p15.5_p15.4|lom) should be presented as var(“g.0_10700000|lom”) based on the genomic coordinates for p15.5 and p15.4 from the Cytoband file. It would be a cruel and unusual punishment to force people to look that up.

BioDati Studio exists to make it easier to map the common way people represent chromosomal locations into a reusable, shareable and computable format.

We do not support this mapping between common language and canonical HGVS variants automatically yet — that is a feature planned for the future. What we try to do is to be forgiving in what we accept and strict in what we return to provide the most frictionless experience possible when working with the complexity of modern biology.

Finally

BEL has the flexibility to represent most biology.

Since BEL is a living language, if you have a use case that is not handled effectively, please propose an enhancement to the BEL Language Committee.

--

--