Scientific Facts About 16S rRNA Gene Sequencing

10 min readMay 8, 2018


by uBiome Scientists

As the leader in microbial genomics, we know a lot about microbiome sequencing. We use a range of different sequencing approaches, including 16S rRNA gene sequencing, full metagenomics, and our patented precision sequencing™.

One of our earliest advisors was Dr. Joe DeRisi, Professor at UCSF, MacArthur Genius award winner, sequencing pioneer, and inventor of numerous sequencing techniques. uBiome has filed patents on over 15 new sequencing methods, including precision sequencing™, CRISPR-based library preparation, combinations of RNA and DNA, as well as optimizing current methods for the microbiome.

We have a team of over 60 scientists working with molecular as well as computational techniques for understanding the human microbiome. And each month, we generate terabytes of sequencing data. Our dataset, which is over 250,000 samples and projected to be over 1 million by the end of next year, is the largest human microbiome dataset in the world.

Even though 16S sequencing is just part of what we do, it is an important tool in the toolbox of anyone trying to understand the microbiome. It is one of the best techniques for high-throughput analysis of thousands of samples. The 16S gene is present in every bacterium and archaeon. Because so many labs all over the world have been and are using this approach, 16S sequence databases are unparalleled in size. So almost every 16S sequence read can tell you which bacteria and archaea are present in a sample.

There was a recent interesting discussion on Medium and Twitter about the usefulness of 16S sequencing. Eran Segal and Jonathan Eisen, two scientists and pioneers in the microbiome research world, both agreed that 16S sequencing is a great approach for microbial community analysis.

We wanted to take a closer look at some of these recent claims made about 16S sequencing and see if we could help shed some light. What is true about 16S sequencing, and what is just “fake news”?

Fake News:

“16S sequencing is useless. It is a complete waste of your money.”

Since the birth of microbiome research, the 16S rRNA gene (“16S”) has been recognized as a powerful tool with which to classify microorganisms. 16S is a gene that is present in all bacteria and archaea (another type of microorganism). 16S sequencing can be used to identify these microorganisms and determine their relative abundance in a biological sample, such as your gut.

16S sequencing was the technique of choice for the National Institutes of Health’s Human Microbiome Project, in addition to thousands of laboratories worldwide. Each year, hundreds of scientific studies based on the 16S gene are published. Focusing on the same gene has allowed researchers all over the world to compare results with each other and build databases that contain millions of 16S sequences. The Ribosomal Database Project, for example, has over 3 million different 16S rRNA sequences, and the SILVA Database has over 2 million.

These extensive databases are an advantage of using 16S instead of whole genome DNA or transcriptomic (RNA) sequencing. The number of bacterial and archaeal genomes that have been sequenced to (near) completion is much smaller; NCBI’s Genome Database contains only 135,000 different genomes so far. Other widely utilized databases, such as KEGG, only contain information for around 5,300 organisms.

Simply put: if you use 16S sequencing, there is a large chance that your sequence will be present in the 16S database, making it easy to identify to which bacteria or archaea the gene belongs. If you use metagenomic or metatranscriptomic analysis, on the other hand, your chance of finding a sequence in the genomic databases is much smaller and could simply be reported as an “unknown gene from an unknown bacteria”. Not so useful.

At uBiome, we have developed our own curated 16S database from our dataset of human microbiomes, which is the largest in the world. For our products, we use a version of this 16S database that we use to report genus or species-level taxa. In addition, our team of bioinformaticians and engineers have developed automated pipelines in which every read is compared to this database.

Fake News:

“16S can only identify bacteria.”

This is misleading, at best; over 99% of the genes in our gut are bacterial, so focusing on bacteria is not a bad thing. Moreover, the method we use at uBiome to amplify and sequence the 16S gene can identify both bacteria and archaea, a group of microorganisms discovered in 1977 by Carl Woese using — you guessed it! — 16S rRNA gene sequencing. So whoever said this may not have heard of archaea, which also happens to be the third domain of life. It is true that fungi and yeasts cannot be identified with this method. However, they can be identified with some of the other methods we use in our products — full metagenomic and precision sequencing™.

Fake News:

“16S is just one gene. Metagenomics or metatranscriptomics will identify all living organisms”

Let’s say that your sample contains 1,000 different bacterial species, and each species contains, in general, between 2,000 to 5,000 different genes. That is between two and five million different genes!

Put differently, imagine you have thousands of different puzzles, each with a different design, and all the puzzle pieces are mixed together in one big box. Undoubtedly, there are many fewer corner pieces than center puzzle pieces, and it would be much easier to match 100 corner pieces to the different designs than 100 middle pieces. Similarly, it is much easier to match 10,000 16S reads to the species that they belong to than 10,000 random gene reads. Because 16S analysis focuses on just one gene, all 10,000 or more sequencing reads are of the 16S gene. The extensive databases we mentioned earlier allow us to easily tell which bacteria are present in your sample. It is also very likely that we will be able to find 16S reads from all of these 1,000 species.

With a minimum of 10,000 sequencing reads, each bacterium will be, on average, covered 10 times.

With metagenomic or metatranscriptomic analysis, the same 10,000 sequencing reads will not be enough to cover all 1 million different genes in the sample. Many of these can not be matched to a known organism because, the genomic databases are not large enough. If you want to go really deep into the analysis of your sample, you will need to sequence millions of reads, which will cost you easily 100 times as much as a 16S analysis.

That is why we also developed our patented precision sequencing™ platform, a technique that combines 16S sequencing with enhanced features. We are very excited about this, and we hope to tell you more about that in a future blog.

Partially Fake News:

“In some recent scientific publications, the 16S technology has been shown to produce lots of false results. A peer-reviewed study by Edgar determined that 16S sequencing of known bacterial communities resulted in a 56% to 88% false positive rate of predicted genus names.”

This is partially correct, but it’s not applicable to uBiome data. The study mentioned above (Edgar) was investigating a very specific bioinformatics analysis pipeline (QIIME) and a very specific 16S rRNA gene reference database (Greengenes). One of the problems identified in this study was that, in the Greengenes database, certain genera were placed under multiple families, thus creating unreliable taxonomic lineages.

As we wrote above, at uBiome we use a proprietary bioinformatics pipeline and a different, manually curated sequence database that does not have these taxonomic overlaps.We have made sure that there are no genera that fall under different taxonomic lineages. So the problem described above does not apply to our bioinformatics analysis. If we label a 16S sequence with a name, you can rest assured that we got the taxonomy right.

Partially fake news:

“Both 16S and metagenomic methods have another drawback: they analyze DNA, not live microorganisms. DNA is very stable, so even DNA from the food we consume and from dead microorganisms finds its way into stool samples, thus wasting sequencing data and confounding the analyses”

This is partially true. DNA is indeed very stable, but the DNA from the food we eat is already chemically or enzymatically degraded in the stomach and in the intestines. About 99% of the genes in our stool come from bacteria, not from our food, so in metagenomic sequencing hardly any data is wasted at all. In addition, RNA is more unstable than DNA, so the inverse problem could be true for samples for metatranscriptomics: since RNA has a very short life-span, the estimation of microbial activity from metatranscriptomics will always underestimate the actual activity of the microbes from a sample.

Fake News:

“16S sequencing is unreproducible and unreliable. If you sequence the same sample twice, you will get very different results”

This is false.

The source of this claim is likely a post on the website Science News, where the same sample was analyzed by 2 different groups: uBiome and American Gut (a nonprofit university project). uBiome’s sampling method contains a proprietary stabilization buffer, which preserves your sample immediately after sampling. American Gut does not provide a stabilization buffer, so some bacteria can keep on growing as the sample gets shipped to the laboratory.

Since these 2 assays use very different shipping conditions and DNA extraction methods, it is not surprising that the same sample sent to 2 different companies can give different results. We responded officially to this in 2014 on the uBiome Blog.

Sequencing of the same sample using uBiome’s technologies is extremely reproducible. In fact, in the graph below you can see that analyzing the same sample 50 times leads to remarkably similar results! (In fact, we are submitting an article for peer-review on this very subject.)

Genus-level uBiome Explorer data of the same stool sample that was extracted, amplified, and sequenced 50 times, gives highly reproducible microbial profile results. Samples were all analyzed in different, independent runs. Source: uBiome.

Fake News:

“16S sequencing will only provide you with genus-level data. On genus level, our microbiomes are 95% identical.”

This simply isn’t true.

Each person has their own unique microbiome. Thanks to our microbiomes, we look as different from each other on the inside as we do on the outside!

Still not convinced? Below you’ll find a graph of genus-level gut microbiome data from 50 different people, analyzed using uBiome’s testing kits. As you can see, we’re all very different!

Genus-level uBiome Explorer data from 50 different stool samples, each from a different person. The plot shows the 50 most abundant genera in this data set. All other genera are grouped together and shown as “Other genera”. Each person’s stool has a unique microbial fingerprint that changes over time. Source: uBiome.

Fake News:

“Genus level is not accurate enough. The resolution is too low. Humans, dogs, and rats all belong to the same genus. At the genus level, we are all mammals — so genus level analysis is useless.”

This is completely wrong! Whoever said this perhaps didn’t pay attention during science class.

Mammals are a class, not a genus. Humans are Homo sapiens — belonging to the genus Homo. Even our closest relative, chimpanzees, belong to a different genus, Pan. Genus level analysis is pretty good in telling all of us mammals apart, and, for bacteria and archaea, genus-level analysis has equally good resolution.

Scheme showing classification on different taxonomic levels. All living organisms, from mammals, to plants, to bacteria, are classified using this scheme. For example, at the Class level all mammals are grouped together. Foxes, wolves, and coyotes belong to the same family level group, but each belong to a different genus. Genus-level analysis can clearly tell humans, dogs, and rats apart. Source: Wikimedia Commons (Author: Annina Breen).

uBiome’s analysis often goes even deeper than genus-level analysis. For example, the probiotics panel on our Explorer product reports to the species level. Our clinical products, SmartGut and SmartJane, use precision sequencing™ to identify a panel of gut microbes on species level, with high specificity and sensitivity. SmartGut identifies 13 species and 13 genera, while SmartJane identifies 17 species and 15 genera. The science behind the selection of each of these targets, and the validation of the methods to make sure that we can detect them with high precision is available online, so you can read more about that if you like. For SmartGut, that information has been published in a peer-reviewed scientific paper in PLOS ONE, while a preprint of the development of the SmartJane assay is available as well.

Two of our current products offer species-level precision sequencing, but even the resolution of our Explorer product, where genus level is used, is high enough to distinguish all of us from each other.

Edit Note: @AlexJProbst correctly pointed out on Twitter that 16S rRNA gene sequencing cannot be used to determine the absolute abundance of microorganisms. We change the sentence to clarify that relative abundance was meant. We thank him for his comment

Further Reading:

This post originally appeared on the uBiome blog




We've developed the first sequencing-based, clinical microbiome tests. Discover what your trillions of bacteria are trying to tell you. #LoveYourMicrobes!