How DNA Sequencing Started the Microbiome Revolution

… and What It Means for the Future.

Carly Anderson
Prime Movers Lab
9 min readJun 9, 2020

--

Key Takeaways:

  • Huge reductions in the cost of DNA sequencing are responsible for the exponential growth of microbiome-related companies. The cost of sequencing DNA today is less than 0.1% what it was in the year 2000.
  • Lab automation, improved microfluidics and public funding for science (including the Human Genome Project) helped drive these cost reductions and continue to enable microbiome research.
  • Today, microbiome companies use a broad set of DNA sequencing technologies called Next Generation Sequencing (NGS) technologies.
  • Some NGS technologies like “16S” are relatively cheap and give high level info about which microbes are present. Other long read sequencing methods are more expensive but give more detailed information.
  • Advances in computing power, data science, and machine learning are enabling a new wave of microbiome companies. Since these have the potential to “lift all boats”, my current view is that a team’s ability to execute on their business model, access data, and luck will drive success in this space.

In our first post on the Human Microbiome, we learned that our health is intertwined with the trillions of microbes which also inhabit our bodies. Today, there are over 200 microbiome startups attempting to turn these hitchhikers into beneficial and high-value products, according to an industry news site. Why are we experiencing this microbial renaissance, and what does it mean for the future? To predict where microbiome products and therapies are headed, it’s important to understand how we got here.

Why Now? Science.

In this case, the explosion in research, products and companies targeting the microbiome was unlocked by massive advances in several areas:

  • DNA (and RNA) sequencing
  • Lab Automation
  • Cost of computing
  • Data science
  • Large collaborative research efforts

DNA sequencing technologies are so critical to microbiome research and the broader biotechnology industry that it’s worth appreciating the magnitude of the advances in this space.

In the past, identifying a single microbe was an arduous process. Bacteria from a sample would be collected, transferred to a petri dish, and allowed to multiply for several days, a method called “culturing”. [1] The doctor/scientist/researcher would then look at the bacteria with microscopes, stain it with chemicals, and use other techniques to try and figure out what it was — effectively playing a game of “Guess Who”.

This changed when we learned to quickly and cheaply read an organism’s genome, its DNA blueprint. All living things — from human cells to microbes — use the same four DNA bases (A,T,G, and C) in different sequences to encode how to reproduce, make proteins, and everything else needed for life.

The first DNA sequencing technology was developed in the 1970s by Frederick Sanger and still known as Sanger Sequencing. From the 1980s through the mid-2000s, DNA sequencers based on Sanger’s method became better, faster, and cheaper as laboratory automation increased and new ways to identify the different letters were found. [2] Incredibly, costs for DNA sequencing in the early 2000s were already falling at a rate comparable to Moore’s Law for computing costs (Mardis 2013).

However, Sanger sequencing has limitations — it reads a single piece of DNA at a time. In the mid-2000s, new DNA sequencing techniques were developed that read hundreds or thousands of DNA segments simultaneously. These “Next Generation Sequencing” or NGS technologies allowed genomes to be read in a fraction of the time, and created an inflection point in the price of sequencing DNA (Hayden 2014). In 2007, the cost of sequencing the equivalent of a human’s DNA (3 billion letters) was roughly $10 million. By 2019, it had dropped to below $1000. Sequencing powerhouse Illumina has claimed the “$100 genome” was in sight.

Adapted from Hayden 2014 (Nature) and Chui et al 2020 (McKinsey Global Institute)

From 2000 to 2015, the cost to sequence the amount of DNA making up the human genome decreased by OVER FOUR ORDERS OF MAGNITUDE.

Sanger sequencing was a single-lane country road through the novel genomic landscape; NGS sequencing effectively created a high-speed 12-lane freeway. The low cost of sequencing enabled scientists to sequence and identify an incredible number of microbes from all sorts of habitats, including the human body. It wasn’t long before scientists found that the human microbial ecosystem was far larger than we’d thought, and that there were trends with which microbes were present in people with different health conditions.

These breakthroughs were spurred by large collaborative and publicly funded research efforts. The Human Genome Project ran from 1990 until 2003 when victory was declared — the human genome was successfully sequenced for the first time, revealing the entire DNA code specifying a human. This incredible international research effort — which made all data and findings publicly available — accelerated genomics research and development and helped make DNA sequencing accessible. I can’t help but point out that this investment by government in “big science” — $3.5 billion over the life of the project— also generated a massive return on investment in US economic output, jobs, and federal tax revenue. [3]

The output from an early automated DNA sequencing machine. Each vertical column shows the sequence of letters (A,T,G,C) in a given stretch of DNA. Each of the four letters (DNA bases) is labelled with one of four corresponding colored dyes. This sequence is part of human chromosome 1. (Image Credit: Sanger Institute)

Inspired by the Human Genome Project, the National Institute of Health (NIH) launched the Human Microbiome Project (HMP) in 2007. It was specifically designed as a community resource program to develop broadly available computing and statistics tools, measurement protocols, and reference datasets to help advance the emerging microbiome field. When gut health test kits started appearing in the early 2010s, they referenced the HMP and other NIH-reviewed procedures. By establishing the validity and initial best practices for measuring microbiome species, they created confidence in startups’ ability to provide meaningful information/products in this space.

Today’s Tools

DNA sequencing technologies launched the microbiome revolution in the early 2000s. Advances in computational power, data science and machine learning are enabling further progress. This is partly because of how DNA sequences are read.

Sequencers are only designed to “read” a certain number of letters before stopping, so the microbial DNA must be chopped into smaller pieces before sequencing (this itself is an art). To get the complete sequence, the fragments need to be reassembled—this is done by looking for stretches of letters that overlap and stitching them together. Today, the many types of NGS technologies fall into two categories: short-read sequencing and long-read sequencing. Short-read sequencing reads DNA in 50–700 letter chunks, while long-read sequencing generates “reads” of 15,000 letters or more.

Short-read sequencing is less expensive and can be more accurate within each chunk, but the information gained is limited. A short-read technique called 16S rRNA sequencing has become the standard for large studies and initial characterizations. [4] Techniques like 16S sequencing can identify the genus and sometimes species of bacteria (e.g. there are Salmonella or Listeria present). However, it can’t give you information about individual strains (e.g. there is a coronavirus, but it can’t tell you whether it is SARS1, COVID-19 or something else entirely).

Long read sequencing techniques are significantly more expensive, but are better suited for sequencing new microbes, identifying rare ones, and getting deeper information about the genome. In research settings, experts design their sequencing experiments to extract the most useful information from the fewest experiments at the lowest overall cost — a hairy optimization problem. [5]

Data science and analysis methods like machine learning are critical because as samples become more complex and NGS methods produce larger data sets, reassembling the fragments gets harder. Because the DNA sequences are made of only 4 letters (A,T,C, and G), there may be many places where a pattern might repeat. For DNA fragments of an unknown microbe, or even a cocktail of many different microbes, the method used to analyze the data often determines how much value that data provides. Lastly, cloud computing has become essential to manage the incredible amounts of data generated.

Uncharted Territory

Today we are fairly good at cataloging microbes — answering the question “Who is there?”. Microbiome companies were universally enabled by available low cost DNA sequencing. Some microbiome companies are still focused on this question for the time being, targeting less studied regions including varied skin environments, the respiratory system, and even the microbiome associated with tumors. Despite its established importance for women and infant health, few startups appear to be focusing on the vaginal microbiome. The “per letter” cost of DNA sequencing will continue to decrease as the biotech industry continues to grow and require increasing amounts of DNA to be sequenced.

Many feel that in addition to “who is there”, we need to understand “what are the microbes doing” to develop effective health treatments. To answer this question, researchers and companies are studying which genes are turned on (transcriptomics), and the proteins and other molecules the microbes make and consume (metabolomics). Both are highly equipment and data-intensive, and benefit from both data science and analytical expertise. The alternative is a time-consuming and expensive game of guess-and-check: the presence of a microbe may not mean it’s the cause of the problem, and determining this clinically would be a massive waste of resources.

Looking forward, there are several ways for microbiome companies to gain an edge:

  1. Build large, private databases to gain access to unique insights. Anonymous medical data is particularly valuable and hard to obtain — partnering with medical centers is key to this approach.
  2. Develop advanced AI/ML or other algorithms to find connections that others are missing.
  3. High-throughput screening (HTS) pipelines using a combination of -omics to test many candidate bugs or molecules and hope to get lucky.
  4. Supply a new instrument or diagnostic that will enable the research of others.

Most of these approaches benefit from larger tech trends: advances in computing (cloud) and data science, AI/ML, and continued improvement in DNA screening (genomics) and other -omics. The factors that enabled the explosion of microbiome companies — DNA sequencing, lab automation, and advances in computing — are sufficiently broad to enable competition. At present there is no clear winner. The first-mover advantage appears to have been lost, or mitigated by further advances in tech and reductions in cost. Since further enabling technologies will lift all boats (DNA screening and other -omics, data science, machine learning and AI), my current view is that luck and the team’s ability to execute are likely to determine success in this space.

Thank you to the scientists, researchers, and friends whose knowledge helped inform this post. As always any mistakes are my own, and I would welcome the opportunity to correct them!

Stay tuned for the next post, where we will explore the use of microbes in a clinical (medical) setting both currently and in the future. Addressing medical needs is both the strongest motivator and largest market area for microbiome-related products. The first post on the Human Microbiome introduced this fascinating space.

Notes

  1. While bacterial cultures are still an important medical tool, this process is fairly time-consuming, expensive, and has limited sensitivity. Samples initially don’t contain enough bacteria cells to detect. Most disease-causing bacteria will grow enough to be seen within one to two days, but it can take some organisms five days or longer. (Source)
  2. Examples of improvements to the original Sanger DNA sequencing technology: the ability to fluorescently label and detect DNA bases (the letters A T G and C); capillary electrophoresis, a way to separate molecules with different electrical charges on a tiny scale; and general automation.
  3. A full report describing how the economic value of the Human Genome Project was assessed is available from the Battele Memorial Institute here.
  4. 16S sequencing targets a single bacterial gene (a section of DNA). This 16S rRNA gene is about 1,500 letters long, but often the whole gene won’t be read to reduce cost. It contains patterns of letters that are present in almost all bacteria, allowing specific parts to be targeted for reading. Other regions of this gene vary in a known way, allowing the type of bacteria to be quickly identified in many cases.
  5. I’ve glossed over a great deal of complexity. For example, one of the trickiest parts of DNA sequencing is getting the DNA into a format that the sequencer will read, or “library preparation”. Library prep often represents the bulk of the sequencing cost, and will be specific to a DNA sequencing technology (e.g. Illumina’s MiSeq, or PacBio’s SMRT Sequencing).

Prime Movers Lab invests in breakthrough scientific startups founded by Prime Movers, the inventors who transform billions of lives. We invest in seed-stage companies reinventing energy, transportation, infrastructure, manufacturing, human augmentation and computing.

Sign up here if you are not already subscribed to our blog.

--

--