Introducing a Robust Method to Inexpensively Generate High-Quality Genetic Data for Underrepresented Populations

Anne-Katrin Emde, PhD
Variant Bio
Published in
6 min readNov 1, 2021

In 2019, Variant Bio, a genetics company based in Seattle (USA), and Prof. Tony Merriman from the University of Otago, Aotearoa New Zealand, embarked on a partnership to study the genetic causes of kidney disease and related metabolic disorders like gout and diabetes in people of Māori and Pacific ancestry living in Aotearoa New Zealand (see here for more about the project). This is just the first step on a long road to empowering diagnostics and precision health for Māori and Pacific people.

In order to understand the genetic causes of disease in a specific population, researchers first have to generate genomic data from that population. While the cost of whole genome sequencing dramatically decreased in the last decade, at roughly $700 USD (~$1,000NZD) per person, today it is still expensive at scale.

To tackle this barrier to genomic research, researchers from Variant Bio and the University of Otago have developed a cost-saving approach to generating high-quality genetic data for underrepresented populations. Since people of non-European ancestry, including those of Māori and Pacific ancestry, are generally underrepresented in existing genetic datasets,¹ researchers involved in this study could not rely on external genetic reference datasets. Instead, we worked with individual DNA samples from Māori and Pacific people who consented to participate in the study, using what we call “mid-pass” whole genome sequencing with “imputation.” This technique is cheaper (~$150 USD/~$210 NZD/individual) and more sensitive than similarly-priced alternatives, especially when it comes to identifying genetic variation that is unique to a particular population.

So why does the mid-pass whole genome sequencing with imputation approach matter? First, because this method significantly decreases cost compared to high-pass sequencing, it allows more individuals to be included in studies. It also helps identify genetic variation that occurs in the Māori and Pacific populations, which is vital for understanding the genetic contribution to diseases such as gout and kidney disease, which wouldn’t be possible using previous cheaper approaches (e.g. “genotyping arrays”). Lastly, our work paves the way for other researchers to replicate this approach in order to better understand the larger patterns of how disease affects underrepresented populations and how treatment may address this.

What is whole-genome sequencing, and what is mid-pass sequencing versus high- or low-pass sequencing?

Whole genome sequencing means using DNA sequencing technology to read the genome of an individual. To get a confident readout, each position in an individual’s genome is read several times. The gold standard here is to have on average 30 “reads” of each position. This is called high-pass or 30x sequencing. Mid-pass sequencing is the term we use to describe sequencing in the range of 1–8x, so reading each position on average 1 to 8 times. Low- pass sequencing is typically used to refer to sequencing each position ≤1 time on average. With mid-pass sequencing we have enough data at hand to be quite confident about individual genotypes just by looking across all sequenced individuals in the cohort, but we are still less confident than with 30x sequencing, particularly when it comes to genetic variation that is less common in the population. That is why we combine mid-pass sequencing with imputation, where we use statistical methods to fill-in or estimate genotypes in which we are less confident. The imputation benefits from high-pass sequencing of a subset of people from the same population.

Illustration of different approaches for identifying genome-wide genetic variation in a population that is not well-represented in existing reference panels. Image credit: Anne-Katrin Emde

Who controls the data and findings from this study?

Details of the method and approach we adopted for this study have recently been published in the open-access academic journal, BMC Genomics, for other researchers to learn from and hopefully build upon going forward.² It is important to note that the Māori and Pacific genetic data generated by the study are protected under the custodianship of Prof. Tony Merriman and cannot be made publicly available as participants have not provided their consent for this. Importantly, participants can withdraw from the study at any point. Researchers from Variant Bio and the University of Otago are committed to designing studies with the highest ethical standard in mind. To ensure that appropriate standards were put in place for this project Variant Bio worked closely with its Ethics Advisory Board and visited New Zealand multiple times to learn about any potential sensitivities around genetic data collection and use during conferences and consultations with key opinion leaders. The project was also approved by all relevant local IRB agencies in New Zealand, as is always required for studies of this nature.

We are very excited to make sure that the mid-pass method is widely available and researchers working with underrepresented populations understand exactly how to implement it. Because we want others to be able to use them, none of the mid-pass methods that we built in this work are or will be protected by any form of IP. To facilitate the use of the methods we have made scripts and documentation publicly available to all researchers.

Māui Hudson introducing Kaja Wasik, Variant Bio CSO and co-author of the publication, at SING 2020 at the University of Waikato, Aotearoa (New Zealand). Photo credit: Keolu Fox

What are the risks of this study?

The primary risk in any genetic study relates to data security being compromised. However, we have taken extensive precautions to protect against this. Most importantly, Variant Bio does not receive or store personally identifiable information on study participants, ensuring their privacy is maintained. Additionally, after being generated, the genetic data are not shared with any outside parties, and access is strictly limited to only authorized individuals within Variant Bio and the University of Otago.

What are the benefits?

Providing a method for generating data that can be used for precision health delivery is one benefit of this work, but it is important to stress that it makes up but a small piece of the puzzle to empowering personalized medicine for underrepresented populations. Precision medicine relies on evaluating individual disease predisposition, aspects of disease progression, and responsiveness to therapeutics. Genomics is an important aspect of this because certain genetic variants that contribute to disease often differ dramatically in frequency between ethnic groups and in some cases are found only in certain ethnic groups. Without good quality data that is representative of the Māori and Pacific communities precision medicine efforts will lag behind for those groups.

More immediately, as a part of this study, we are committed to maximizing benefits for the Māori and Pacific communities participating in our study via a range of initiatives. Overall Variant Bio has committed a total of $100,000 USD/$140,000 NZD towards initiatives around research, healthcare, education, and environment that benefit Māori and Pacific communities in Aotearoa New Zealand (more details here). For example, Variant sponsored SING 2020, the very first Indigenous Genomics Conference of its kind, which aims to promote Indigenous partnerships in genomic science.

In the longer term, once Variant Bio begins to generate revenue (currently it is an early-stage startup), the company has committed to share 4% of its future revenue (meaning income before expenses, as opposed to profit, which refers to income after expenses) every year with all of its partner communities (including Māori and Pacific communities in Aotearoa New Zealand) until the company is acquired or goes public, at which point partner communities will receive 4% of the company’s equity value (for more on Variant Bio’s benefit-sharing program, see here).

SING 2020 conference poster. Image credit: The SING Consortium Aotearoa
  1. https://www.nature.com/articles/d41586-019-01166-x
  2. Emde, AK., Phipps-Green, A., Cadzow, M. et al. Mid-pass whole genome sequencing enables biomedical genetic studies of diverse populations. BMC Genomics 22, 666 (2021).

--

--