Spotlight on Bioinformatics: A Decades-Long Passion for DNA Sequence Analysis

Anne-Katrin Emde, PhD
Variant Bio
Published in
4 min readMay 22, 2023
Melanie Kirsche, an expert in bioinformatics at Variant Bio, hard at work at a computer
Melanie Kirsche, an expert in bioinformatics at Variant Bio. Photo credit: Christina Jarvis

Working in the field of bioinformatics was the result of a string of happy accidents. I’d like to be able to say I knew what I was getting myself into when I enrolled in my undergraduate program, but in truth, I had no clue. I just knew that I wanted to do something with math and computer science, and wanted it to have some interesting and meaningful applications.

This is exactly what was in store for me. The field of bioinformatics is all about using and creating computational tools to analyze and store biological data. It leverages a combination of math and computer science tools, and applies them to tackle big biological questions.

At Variant Bio, our bioinformatics efforts go towards analyzing DNA and RNA sequencing data to identify genetic variation and gene expression in the studies we conduct all over the world. This means analyzing tera (or, by now, peta) bytes of sequencing data, and doing so in a way that is computationally (and financially) feasible.

We do this analysis using Amazon Web Services (AWS) — a renowned and robust cloud computing platform which provides the computer power needed, allows us to scale to the needed resources at any point in time, and complies with global regulations surrounding data privacy in the different countries we work in. Since we work with people’s genetic data — meaning very personal information, not only for individuals but for their families and entire communities — we need to act with the highest security standards and make sure the data is safe under our stewardship (see here for more on Variant Bio’s data and sample governance principles and commitments).

Early inspiration

Back when I was studying bioinformatics at the Freie Universität Berlin, I attended a series of lectures by Prof. Knut Reinert, who was involved in Celera’s efforts to assemble the first human genome and would later become my Ph.D. advisor. This, and the fact that new sequencing technologies were revolutionizing the field, successfully got me hooked on DNA sequence analysis.

Fast forward to today, where one of the most exciting parts of my role as Director of Bioinformatics at Variant Bio is that we are starting to actively work with ever newer DNA sequencing technologies producing so-called “long reads.” Long read technologies are able to sequence longer stretches of DNA at a time than the currently standard “short read” technologies. If you imagine reading a book, it’s as if now you can read whole chapters at a time, rather than just individual sentences. This helps make sense of the contents of the book, and, taking it back to genomes, this means that we are able to make more of the genome accessible to analysis.

With this data, we can then detect different types of genetic variation, including large structural variants (SVs) that can affect thousands of bases, and have been somewhat under-explored since they are so hard to detect reliably with standard short-read sequencing technologies. Melanie Kirsche (pictured above), who joined the Bioinformatics team at Variant Bio last summer, is an expert at working with these population-scale SVs. I’m excited and curious about the discoveries we are going to be able to make with this data.

Part of a greater vision

The Bioinformatics team connects directly to Variant Bio’s overarching mission of leveraging the power of human genetic diversity to develop life-saving therapies by creating the genetic “variant call sets” that are the basis for genome-wide association studies (or GWAS). These GWAS are, in turn, at the heart of how we try to link genes to phenotypes in order to understand complex disease biology.

To create these call sets we use a mix of directly detecting variants from the sequencing data and a method called “imputation” which allows us to infer variants we couldn’t confidently observe in the sequencing data. Using this approach of mid-pass whole genome sequencing with imputation, we can lower sequencing costs and include more participants in a study, which increases power to make relevant discoveries — and frees up budget for doing more studies, increasing our chances of developing life-saving therapeutics.

While I never imagined where studies in Bioinformatics would lead me, I couldn’t be happier with the result. There is no shortage of exciting work, and our team plays a critical role in making Variant Bio’s vision a not-so-distant reality.

--

--