Bringing Genomes to Life: Going Beyond Sequencing to Realize the Value of Genetic and Genomic Information

Clinical genetic and genomic tests today are non-quantitative and fraught with uncertainty. The machine learning solutions that do support results delivered to patients today have low accuracies, are based on small fractions of data that are in the public domain, and offer no mechanistic insights to advance care. We need to change all of that.

By Carlos L. Araya (@claraya), co-founder and CEO, Jungla Inc (@junglainc).

Summary

Understanding how variation in our genomes relates to disease and treatment plays a foundational role in advancing precision medicine (i.e. developing care that is tailored to each patient). Spurred by the remarkable advances in our ability to sequence and detect variations in our genomes, tests that seek to leverage this information in clinical settings have rapidly grown. In the US alone, over 10 new tests enter the market each day. Yet, our ability to comprehend the significance of such variations and inform our actions in the clinic lags behind. Modern genetic tests find as many as 95 variants of unknown clinical significance for every known disease-associated variant. This paucity of knowledge results in significant gaps in both the fraction of patients who derive benefits from genetic testing and the fraction of information used to deliver benefits to each patient. These deficiencies ultimately limit our ability to find answers in our genomes when we need them the most.

To overcome these limitations and accelerate progress, Jungla has developed a cloud-based Molecular Evidence Platform (MEP) that marries clinical knowledge with advances in functional genomics, biophysics, cellular engineering, machine learning, and distributed systems to help clinicians and patients understand the results of genetic and genomic tests. By weaving interdisciplinary knowledge and methods, the MEP leverages state-of-the-art computational and experimental approaches to systematically generate, quality-control, validate, and distribute support to the precision medicine industry that is scalable, robust, and transparent. This has allowed us to provide high-quality support for millions of variations in the human genome. We are working closely with leading commercial providers, clinical and scientific advisors, and investors to bring these benefits to patients.

Our goal is to enable precision medicine by driving and placing science and technology directly in support of clinical genetics teams and at the service of patients. Read on to learn more about the current issues in interpreting genetic and genomic data and the work Jungla is doing to address those issues.


It’s A Remarkable Time

It is quite a remarkable achievement, really, for a species to sequence its own genome. A genome defines much of the molecular machinery that the cells in our bodies rely on for communication, expansion, and survival. This information is not static but evolving; as we grow and reproduce, each individual cell accumulates small (and sometimes large) variations. Thus, our bodies are composed of an amalgam of trillions of closely related genomes, some of which we pass to our children. Passed between generations over millennia, our genomes provide a record of our shared human history and directly connect our biology with that of all living organisms. We should celebrate this progress.

Will We Make the Most of Our Genomes? (The Value of Genetic and Genomic Information Is Not the Sequence)

Yet, genomes can offer us much more than a view of our ancestry and common origins. While it is no doubt a landmark achievement, accessing our genomes to review our history is not the same as understanding how our genomes affect our health and wellbeing. A lot of biology happens between our genome and our physical, living bodies. This biology happens not at the level of genome sequences; it happens at many scales through chemical and molecular biology, to cellular and developmental, and even evolutionary biology. While most variations in our genome are easily detected with modern technologies, each one can alter the functionality of the machinery that translates genomes into living cells, tissues, organs, and systems to impact biology at these varied scales. To understand the significance of the information within our genomes, we must therefore consider each variation not in the context of a genome sequence but rather in the context of the molecular and cellular machinery where this information literally comes to life.

Whether through effects on molecular machines made of DNA, RNA, proteins, or combinations thereof, variants alter machinery in ways that affect diverse biophysical processes, molecular functions, and cellular functions, resulting in varied clinical and non-clinical manifestations. For example, as you read these words, hundreds of millions of proteins, termed rhodopsins, stacked within the most light-sensitive cells in your retina convert energy — within ~200 femtoseconds or 200 x 1/10¹⁵ seconds — from single photons into kinetic energy to mechanically initiate the signals our brains perceive as vision. By altering the chemical biology of these proteins, variations in the rhodopsin gene, Rho, can lead to progressive and stationary forms of blindness. Often, the mechanisms responsible for clinical symptoms are controlled by networks of chemical groups across collections of molecular machines. Consider that human genomes encode tens to hundreds of thousands of molecular machines and it’s easy to appreciate just how far a genome sequence is from describing which conditions we are likely to develop or how we may respond to a therapy.

Daunting as it may be, science has a way of pulling us forward. As scientists, engineers, practitioners, patients, and individuals, it is a privilege to live in a time where we can begin to leverage genomic information for the health and wellbeing of ourselves, our families, and our communities. As you’ll find below, we will have lots of genomes. The question is will we know how to make the most of them?

Genetic and genomic information plays a foundational role for precision medicine in a growing array of clinical settings.

What We Know Today (It’s Just the Tip of the Iceberg)

Advances in sequencing technologies — some of the fastest technological developments in history — have brought down the cost of acquiring genomic information more than five orders of magnitude over the past fifteen years. Fueled by these technologies, more than 100 million human genomes are estimated to be sequenced within the next ten years, and most market research suggests the ubiquity of personal genomes will rival that of cell phones by 2050. Estimates from manufacturers indicate that this year the volume of clinical genomes will exceed the number of academic genomes. Yet, less than 1% of the molecular variants observed in just a small sample of the human population (~0.003%) are clinically understood in terms of their relationship to disease.

We owe our current understanding of the clinical significance and associations of variants to the remarkable efforts of clinical geneticists, healthcare providers, and patients and their families. It is not long ago that acquiring genetic information was expensive and time-consuming. Owing to either inherent limitations in sequencing technologies, or to testing in specific genes and conditions being patent-eligible matter that could be held under monopoly, simply identifying genetic or genomic variations in patients was, until recently, a challenge. Discovered on a case-by-case basis, established in clinical and family studies, and carefully characterized in low-throughout biochemical and cellular assays (or even more laborious animal models), the compendium of knowledge we have today has been, and continues to be, painstakingly acquired.

Our precision medicine infrastructure today relies heavily on an observational framework that aims to piece together clinical knowledge from patients with the equivalent genomic alterations.

This compendium forms a network connecting molecular machinery encoded in our genomes to clinical conditions through the observation of variants in affected individuals. While we can establish these connections by statistically associating individual variants with conditions, we are only effective at doing so for the variants that are common in the population and for which observations in health and disease are easily accumulated.

This current compendium of knowledge, therefore, represents just the tip of the iceberg. The vast majority of genetic variants in the human population are rare, are present in only low numbers of individuals, and are of unknown clinical significance. Albeit crude, estimates suggest that every possible genetic variant is present in ~51 individuals in the human population. Considering that we all harbor unique combinations of variants, the futility of relying exclusively on observational approaches is clear. How then, can we hope to make sense of the information in our genomes?

Building Collaborative Intelligence (Enhancing Clinical Testing with Distributed, Quantitative, Model-Driven Support)

While far from complete or perfect, the existing knowledge opens the road for a quantitative, scalable, collaborative intelligence approach: the generation, quality-control, and integration of models that can accurately explain the clinical significance of well-studied variants and predict that of newly observed variants. This strategy can allow us to support clinicians with models that accurately capture the relationship between sequence variation and functional variation of the molecular components that relate to each disease. These models can be dynamically refined and quality-controlled with data from each patient, thereby providing a robust, scalable, and transparent collaborative infrastructure for quantitative clinical testing that can be continuously updated, reviewed, and audited.

This is not an entirely novel concept. While current practices for the clinical interpretation of genetic findings are largely manual and non-quantitative, already ~50% of the variants interpreted in clinical tests today are classified with the support of machine learning models. It just so happens these solutions have a historical balanced accuracy of ~58% (MCC) for clinically relevant variants. We believe patients can be served better.

A series of issues are at play here. First, modeling sequence-function relationships at scale is a relatively new field, and the expertise, systems, and technologies required have, to date, been largely confined to development within academia. As a consequence, such solutions are able to access only the limited fraction of clinical data that is shared in the public domain, which has been estimated to be ~15% of the total. We believe these fields are ripe for innovation, optimization, and scaling to support patient care leveraging the much larger volume and detail of data generated by clinical practice that is not available in the public domain.

Second — and this is a big one — the challenge is inherently difficult. It is not a cop out; biological systems are complex. While understanding large-scale clinical genetics and genomics data has been predominantly approached as a sequence information challenge, such solutions largely ignore the vast diversity and complexity of mechanisms of biological function. Because genetic and genomic variation can affect our biology at many different scales, leveraging this information across system (e.g. residue, molecule, complex, or pathway) and time scales (e.g. molecular, cellular, developmental, or evolutionary) requires bringing together knowledge and techniques from a multitude of disciplines.

Third, whether computational (drylab) or experimental (wetlab), we to date lack technologies for modeling or measuring the functional consequences of thousands of variants acting at the levels of distinct molecule types, biophysical properties, molecular functions, and cellular impacts. While powerful, first generation large-scale functional assays — such as Deep Mutational Scanning, which we demonstrated in 2010 and 2012 — rely on a proxy of molecular function (i.e. selection) and have substantial limitations for addressing the needs of clinical testing. Second generation solutions permitting quantitative measurements of specific molecular functions — such as RNA-MaP, which we demonstrated in 2014, and Pro-MaP — are restricted to capturing a narrow sliver of activity and can be even more challenging to scale.

Taken together, there is ample room for improvement when it comes to helping patients and physicians who require additional, and currently labor and cost intensive, insights to inform care. Difficult as these may seem, we believe these are surmountable challenges that can be overcome by bringing together the right mix of expertise, innovation, and collaboration.

Whether physically in cellular models or digitally in computational models, functional modeling allows scientists to introduce and study the effects of variants on biological systems.

Introducing Jungla (Who We Are & What We Do)

With more than 10 new genetic or genomic tests entering the US market each day, solutions that can continuously optimize and guarantee the quality of tests could improve the quality of life for the millions of us who come to rely on genomic information each year; often at the most fragile moments of our lives. Given the critical role of genetic and genomic information for precision medicine, empowering patients and teams practicing clinical diagnostics with synergistic, state-of-the-art solutions that maximize the value of this information is the premise that drove Jason Reuter, Alexandre Colavin, and I to leave Stanford and co-found Jungla. To accomplish this goal, Jungla brings together leading expertise in functional genomics, computational biophysics, cellular engineering, machine learning, and technology development, into a cloud-distributed Molecular Evidence Platform (MEP). Leveraging state-of-the-art computational and experimental approaches, the MEP systematically generates, quality-controls, validates, and distributes support to the precision medicine industry that is scalable, robust and transparent. The scalability and robustness of the technologies in the MEP have allowed us to generate high-quality support for more than four million clinically relevant variations in the human genome, with prospectively validated balanced accuracies above 90% (MCC) and predictive values above 95% (PPV, NPV).

As the scope and performance of the MEP continues to expand, we are pioneering increasingly mechanistic approaches, including third-generation experimental technologies. Designed to overcome the limitations of existing high-throughput functional assays, these new technologies enable pathway-scale mutagenesis while providing mechanistic insights into the impacts of variants on molecular and cellular function.

A collaborative intelligence infrastructure brings together clinical knowledge from patient findings (top left) together with computational (bottom left) and experimental (bottom right) functional modeling capabilities in an artificial intelligence framework (center) that provides insights to diagnostics and therapeutics professionals.

We believe clinical genetic interpretations should be quantitative, clear, auditable, and always up-to-date. With this perspective in mind, our platform has been designed for continual improvement and thorough external review. To guarantee the quality of the support supplied to clinical test providers, Jungla leverages cryptographic technologies to continuously measure and track the prospective performance of each line of evidence within the MEP. We further ensure transparency by making our solutions available to third parties and have developed methods for secure performance assessments.

Coming Your Way (Where We Are)

We’re proud to count on the backing of Andreessen Horowitz (a16z), whose $2.5m Series Seed investment led by Vijay Pande brings deep expertise in healthcare, information technology, and biological innovation to our team. In addition to Vijay, a renowned expert in computational biophysics, and the talented teams behind Andreessen Horowitz and SOS Ventures –including Arvind Gupta from IndieBio––, we are privileged to have the support and guidance of Birgit H. Funke, Douglas M. Fowler, and Thomas Schoenherr, leaders in clinical genetics and genomics, molecular technologies, and diagnostics business development, respectively. Jungla Inc. is a resident company at Johnson & Johnson Innovation, JLABS at South San Francisco (JLABS @ SSF). JLABS is a 30,000 square-foot life science innovation center, located in South San Francisco.

In the coming months, we will be announcing partnerships with some of the world’s leading genetic and genomic test providers, demonstrating both parties’ commitment to optimizing the quality of clinical tests. We will be providing updates and further insights ‘under the hood’ as we work across the layers of team, technology, corporate, and partner development to bring you quantitative clinical genomics. Whether you are someone who wants to share your story of how clinical tests have touched your life, or a technologist, scientist, or clinician who wishes to further the mission of understanding and addressing human disease through genetics and genomics, we look forward to hearing from you.

We’re excited and invite you to be part of our journey.

–––––