New Species of Plant Bacteria Reshapes Our Views on DNA Organization

The unorthodox way one bacterium organizes its genome blurs the line between “essential” chromosomes and “accessory” plasmids.

Kevin Blake, PhD
The Quantastic Journal
14 min readSep 3, 2024

--

by Kevin Blake, PhD | @kevinsblake

acterial genome.
Bacterial genome.

“To know me is to fly with me,” says Ryan Bingham, George Clooney’s character in the 2009 film Up in the Air. After flying 270 days a year and accumulating 10 million frequent flier miles by the movie’s end, Bingham has turned getting through airport security into an art. Every turn of his black roller bag is precise and practiced. He skips crowded lines using the priority access lane; his ticket is held out before TSA asks; and he, of course, never checks a bag. While traveling, Bingham is efficient, organized, streamlined, without a gram of excess baggage or wasted space.

Bacteria pack their genomes much in the same way Bingham packs for his flights: all the essentials in one bag. For most bacterial species, the genome is a singular circular chromosome, and encodes all the essential genes it needs to survive. Though a measly 2% of the human genome codes for proteins (the remaining 98% being a clutter of “junk DNA”) bacterial chromosomes are packed with an average protein-coding density of 87%.

In addition to the chromosome, many bacterial species also carry small supplementary DNA molecules called plasmids. Plasmids encode accessory genes that aren’t necessary for survival but can provide beneficial functions in special circumstances, like antibiotic resistance. Further, plasmids can be transferred “horizontally” between neighboring, often unrelated, bacteria. Like the mind-jack in The Matrix — where a needle goes in the back of Neo’s head, and he immediately knows kung-fu — plasmids can bestow new traits onto their host without the fuss of millions of years of evolution. (In contrast to “vertical” transmission, when a parent cell gives a copy to its offspring.) Plasmids give bacteria unrivaled versatility with the genomes, allowing them to acquire new genes when they’re needed and eject them when they’re no longer useful — all while keeping the chromosome untouched.

The organization of essential genes on chromosomes and accessory genes on plasmids has led to the conclusion that natural selection has favored organized and trim genomes in bacteria. Humans and other eukaryotes may pack their chromosomes like an 18-year-old moving into a freshman dorm, but bacteria have no room for such luxuries. However, the haphazard way one newly discovered species, Aureimonas ureilytica, packs its genome frustrates this narrative, and challenges the usual thinking about how organisms in general organize their DNA.

Sequencing the bacterium’s genome

In most regards, A. ureilytica is an unassuming species. First isolated in 2011 from the stem of a soybean plant, it belongs to the class Alphaproteobacteria, one of the most abundant and diverse groups of bacteria, and the order Rhizobiales, well-known plant symbionts. A. ureilytica was initially studied because of the way its numbers shrink and grow in response to nutritional changes in the host plant, suggesting some kind of symbiotic relationship. In order to investigate the genes that might underpin this plant-microbe interaction, Mizue Anda and colleagues from Tohoku University in Japan set out to sequence its genome.

Whole-genome sequencing revealed A. ureilytica encodes 5.2 million base pairs of DNA organized into nine circular replicons. Genome lengths are measured in base pairs, the number of complementary nucleotide bases (A to T, and G to C) that form each rung of the DNA ladder. In the jargon of bacterial genomics any circular DNA molecule is a replicon, and replicons are further classified as either chromosomes or plasmids. The largest replicon in A. ureilytica’s genome was 3.7 million base pairs, with the remaining DNA split amongst three replicons between 300,000–500,000 base pairs, and five under 100,000 base pairs.

Schematic of A. ureilytica’s genome, with nine circular replicons shown at the same relative size. The plasmid pAU20rrn, the smallest replicon, encodes its sole copy of the 16S rRNA operon. Source
Schematic of A. ureilytica’s genome, with nine circular replicons shown at the same relative size. The plasmid pAU20rrn, the smallest replicon, encodes its sole copy of the 16S rRNA operon. Source

Biologists, always excited to create classification systems, have devised criteria to differentiate between chromosomes and plasmids. At first the task seems simple: chromosomes are large and vertically transmitted, while plasmids are small and horizontally spread. But evolution has a knack for producing oddballs and outliers that blur seemingly clear lines. Though the average bacterial chromosome is 50x larger than the average plasmid, this masks a nearly 100-fold range in the sizes of chromosomes, as well as a sub-class of plasmids dubbed “megaplasmids” that are 10x larger than the average. When looking within a single species’ genome, the chromosome will be the largest replicon. But when trying to make a general rule that accounts for all sizes in all species, there’s an uncomfortable overlap between the smallest chromosomes and largest plasmids.

Well, then what about horizontal transfer? Unfortunately, this is similarly mired in exceptions. Plasmids can sometimes lose the ability to be horizontally transmitted, such as when the genes encoding for the necessary machinery get deleted, leaving them only vertical transmission as a way to spread. Nobody would argue that one nucleotide change should suddenly change a replicon’s label from plasmid to chromosome.

Instead, the characteristic which has served as a practical and robust distinction between chromosomes and plasmids has been the physical separation between essential and accessory genes. The replicon that encodes all essential genes needed to survive — DNA replication, cell membrane production and maintenance, central metabolism — is the chromosome. Any other replicon — which, by definition, only encodes accessory genes — is a plasmid. True, additional copies of an essential gene may be found on a plasmid, but the chromosome will never lack it.

Size and horizontal transfer are fine shorthand criteria, but the real rule is the essential-accessory gene split. At least, that’s what’s been long thought.

Based on this rule, it was intuitive to designate the largest of A. ureilytica’s replicons, encoding 70% of the genome, as the chromosome. However, it would be the smallest and most unlikely of these smaller replicons — a circular ring of DNA just 9,000 base pairs long — that would upset easy classification, and in the process upend the textbook distinction between chromosomes and plasmids. Because this “plasmid,” dubbed pAU20rrn, encodes the sole copy of A. ureilytica’s rRNA operon (i.e. rrn operon) — including the essential 16S, 23S, and 5S rRNA genes.

Top: geneplot of the rRNA operon, including the 16S, 23S, and 5S rRNA genes. Bottom: View of the 16S rRNA gene, with the nine hypervariable regions (V1-V9) highlighted.
Top: geneplot of the rRNA operon, including the 16S, 23S, and 5S rRNA genes. Bottom: View of the 16S rRNA gene, with the nine hypervariable regions (V1-V9) highlighted.

What is the 16S rRNA gene?

Discovering that the sole copy of one of this obscure species’ essential genes is on a small plasmid would have been a surprise in and of itself. But the fact that it included the 16S rRNA gene brings it into a higher level of significance. To fully appreciate why, it’s necessary to take a brief detour to describe the history of the 16S rRNA gene.

Functionally, it encodes a subunit of the bacterial ribosome, the molecular machine that translates mRNA transcripts into proteins. However, its importance in microbiology goes beyond its immediate function. Rather than being just any common housekeeping gene, the 16S rRNA is the gene for studying bacterial phylogeny and taxonomy.

With the advent of DNA sequencing, it became possible to determine organisms’ evolutionary relatedness not by comparing second- or third-order phenotypic traits, like the shape of teeth or number of toes, but by directly comparing the gene sequences that encode these traits. No field was more impacted by this molecular revolution than microbiology. Unlike animals and plants, which are rich in complex morphological traits that can be compared, the morphologies of bacteria are so simple that comparisons based on appearance alone are of little to no use. The best that can often be done is noting whether they’re shaped like a sphere or a rod — not exactly high-resolution data. Physiological traits, such as the nutrients required to grow or the presence of a peptidoglycan cell wall, can correctly group related bacterial species, but they can also miss relatives lacking the feature in question.

Because of these limitations, early attempts at phylogenetic classification of bacteria more often than not created flawed schemes that confused rather than clarified. As a result, microbiologists of the 1800s and early 1900s wisely avoided the subject. It’s not that they were unaware of how insightful determining the evolutionary relationships between species could be. Their colleagues down the hall in the botany and zoology departments were making huge progress investigating exactly these relationships. They just didn’t have the technology to experimentally determine them. The field continued to characterize species, particularly disease-causing bacteria like Eschericia coli and Mycobacterium tuberculosis, but they didn’t actually have any phylogenetic framework to connect these species. The Linnean names given to these taxa suggested knowledge about evolutionary relationships that simply wasn’t there.

That’s where the 16S rRNA gene comes in. With DNA sequencing, genotypes — that is, gene sequences — could be used in lieu of phenotypes to compare and classify any organism. For microbiology, which up until that point had no comparable phenotypes, this was a massive boon.

But what was first needed was a gene that could be used as a “chronometer,” a biological stopwatch that can be used to measure time. Like how radiometric dating calculate the age of rocks by measuring the nuclear decay of radioactive isotopes, chronometers calculate the time since two species diverged by measuring the number of mutations.

A good chronometer meets three criteria: First, it must be found in all species. Despite the incredible diversity of living organisms — from redwoods to rhinoceroses, portobello mushrooms to peacocks, and even A. ureilytica to Homo sapiens — because all life on earth evolved from a common ancestor, there are genes even distantly related organisms share. For example, the genes encoding for human hemoglobin and myoglobin share a common ancestry with other globins found in sharks, insects, and even plants, all of which function to transport oxygen. But even something as seemingly fundamental as oxygen transport is a poor chronometer because many organisms cannot survive in environments with oxygen and therefore don’t have those genes. Or, they may encode genes that are technically related, but whose sequences and functions are so different it’s difficult to say they actually are the “same” gene. Since all organisms have nucleic acid-based genomes which must be replicated and expressed, most universally conserved genes encode for proteins involved in this process: DNA/RNA polymerases, elongation factors, and ribosomal proteins (including 16S rRNA).

Second, its sequence must change randomly over time at a rate equivalent to the degree of evolutionary separation. Like the half-life of radioactive nuclei, genes will acquire random mutations at a more or less constant rate, a phenomenon known as the molecular clock. Counting the number of mutations between two sequences is equivalent to counting the number of years. If Species A and Species B differ by two mutations, while Species A and Species C by five, then Species A is more closely related to B than C. One might think the best chronometers are sequences whose rate of change is equal to the average mutation rate. But these sequences would change so rapidly — evolutionarily speaking — to be of little use comparing vastly unrelated species separated by billions of years. Instead, the best chronometers are essential genes whose functions are so well-maintained by natural selection that mutate very slowly relative to other genes.

Third, the chronometer must be long enough to capture all that change data. Long genes are important to give you better statistics and ensure mutations don’t overwrite each other. But it’s perhaps more important that the gene has independent functional regions, such that a large nonrandom mutation in one region doesn’t affect the others, and the clock can still run smoothly.

The 16S rRNA gene meets all these criteria. It’s found in all species, mutates very slowly, and comprises of different independent regions. The power of this approach was exemplified by Carl Woese and George Fox from the University of Illinois, in their foundational 1977 paper which used this information to discover that the microbes previously lumped together under the label Prokaryotes actually belong to two distinct domains: Bacteria and Archaea. Even with advancing technologies and the ability to sequence whole genomes, 16S rRNA sequencing remains a powerful tool, particularly for microbiologists studying bacteria which cannot be isolated. Many such species are known only by their 16S rRNA sequences.

A phylogenetic tree based on 16S rRNA sequence comparisons, emphasizing the three domain system, as proposed by Woese et al. Source
A phylogenetic tree based on 16S rRNA sequence comparisons, emphasizing the three domain system, as proposed by Woese et al. Source

This unusual genome structure is actually common

The unexpected discovery that A. ureilytica’s sole copy of its rRNA operon was located on a plasmid was amplified by the importance of the 16S rRNA gene. That a species organized all its essential genes into the chromosome except one would be a notable discovery in and of itself. But it’s as if the rebellious A. ureilytica didn’t want any academic hemming and hawing over its exceptionalism, and so deliberately stuck the most iconic, recognizable essential gene it had in a plasmid. And not just any plasmid either. At just 9,000 base pairs, or 0.2% the size of the chromosome, this plasmid hardly anything more than the rRNA operon (6,000 base pairs long) circularized.

This discovery has numerous implications for bacterial genomics and evolution. It’s a clear exception to the textbook rule that the chromosome encodes all essential genes while plasmids are for accessory genes. No longer could that be used as a defining distinction between these two replicons because it wasn’t the case in A. ureilytica. But, as with all interesting questions in biology, the importance of a new discovery lies in its frequency. If A. ureilytica is a unique, oddball species with an interesting but unstable genome doomed to extinction by natural selection, then it can be dismissed as a minor exception to an otherwise robust rule. On the other hand, if other species share this unusual genome arrangement and if it can be stably maintained over long evolutionary periods, then that would have major implications on our understanding of bacterial genomics.

To investigate these questions, in 2023 Anda and colleagues devised a follow-up study. Their goal was to find more species like A. ureilytica without chromosomal rRNA operons. They searched >80,000 bacterial genomes for those whose rRNA operons were located on replicons that, by all other metrics, appeared to be plasmids.

Anda found three genomes which matched these criteria: two Persicobacter species, and one Treponema saccharophilum. Both Persicobacter genomes had three rRNA operons in tandem on a plasmid ~30,000 base pairs long, while the T. saccharophilum genome’s rRNA operon was located on a plasmid 8,400 base pairs long. They then sequenced the genomes of related type strains for these species — P. diffluens and T. saccharophilum JCM 32279 — and determined these strains also have the same arrangement of rRNA operons as the genomes from their search. Therefore, they concluded that P. diffluens and T. saccharophilum are two species additional which lost rRNA operons from their chromosomes. Together, this meant that species from at least three phyla — first Pseudomonadota with A. ureilytica, and now Bacteroidota and Spirochaetota — independently lost chromosomal rRNA operons.

Having established that A. ureilytica isn’t an oddball but in fact one of many species which have lost chromosomal rRNA operons, Anda then investigated how stable this genome organization is. Are these species just random evolutionary one-offs, or is this an arrangement that can be maintained for an evolutionarily long time? Looking at the genomes of other Treponema species, they found that these have their rRNA operons in the chromosome. This suggests T. saccharophilum lost its chromosomal rRNA operon fairly recently.

On the other hand, related Persicobacter species had not yet had their genomes sequenced, raising the exciting possibility that the whole Persicobacteraceae family lacks chromosomal rRNA operons. To investigate this, Anda sequenced the genomes of three species in the Persicobacteraceae family — P. pyschrovividus, Aureibacter tunicatorum, and Fulvitalea axinellae — and found all three lack chromosomal rRNA operons. Their rRNA operons instead are located in the same arrangement on the same plasmid. This strongly suggests that loss of the chromosomal rRNA operon occurred in the common ancestor of these Persicobacteraceae species. That ancestor was then subject to the classic forces of selection, divergence, and speciation — all the while passing along its special genome organization to all descendant species.

The answer to the question of how long this unusual genome arrangement can be maintained is the same as the question of how long ago this common ancestor lived. Unfortunately, because most bacterial species leave little to no traces in the fossil record it’s not possible to determine this directly. But by using the same principles used to calculate evolutionary distances with the 16S gene, by counting the number of differences between two genomes and multiplying that by the average mutation rate, it’s possible to estimate approximately how long ago this common ancestor lived. Using this, Anda estimated their common ancestor of these Persicobacteraceae species lived approximately 500 million years ago — a time so long ago there were no organisms bones, and no grasses or trees. Rather than being short-lived oddball doomed to extinction, it seems bacteria can maintain rRNA operons solely on plasmids for hundreds of millions of years.

Evolutionary model of bacteria without chromosomal rrn opersons. Source
Evolutionary model of bacteria without chromosomal rrn opersons. Source

Nature doesn’t read our textbooks

Living organisms are not — can never be — perfectly optimized. We’re a product of history, a jury-rigged amalgamation of imperfect parts assembled over billions of years for different purposes. Darwin understood this to be the strongest proof of evolution. Organisms could only be perfectly designed in a world without history, and a world without history might as well have been created as we find it. Though history precludes perfection, fortunately, natural selection doesn’t demand it. The only requirement is that organisms work well enough to survive and reproduce.

The sloppy genome organization of A. ureilytica and others is another in a long list of features that highlight this principle. There’s some suggestion that having rRNA operons exclusively on plasmids could provide some evolutionary advantages. Since multiple copies of a plasmid often exist in cells, putting the rRNA operon on a plasmid could increase its copy number, thereby increasing the rate of rRNA synthesis, and theoretically enabling rapid adaptation to changes in the environment. However, these species lacking chromosomal rRNA operons come from very different environments (marine, soil, rumen, plant stems, air), making it unlikely they’re all converging on the same unusual adaptation to the same unique environmental stressor. And even if it was initially selected for in the common ancestor due to some selective advantage, that doesn’t mean all species today continue to utilize that function. Regardless of the potential or hypothetical benefits, it remains that this unusual arrangement arose due to some chance event. Because it doesn’t have any negative consequences which would compel selection against it, it was left as-is for half a billion years.

This story also highlights a second principle of evolution. Though we might seek structure and order in the universe, Nature isn’t under any obligation to provide us with it. There are, to be sure, different kinds of things — different species and different DNA molecules. But the boundaries between kind are rarely clear. They’re fuzzy, with one kind grading into the other. Objects at these boundaries confuse and frustrate us only so long as we insist that everything fits unambiguously into the neat categories we’ve devised for ourselves. The wall dividing chromosomes and plasmids, built upon genuine facts known at the time, has been toppled. These categories have had, and certainly will continue to have, practical importance in the science of bacterial genomics. But it’s a mistake to confuse our labels with reality. When it comes to the study of living organisms, sometimes we, like A. ureilytica, have to accept that a little disorganization isn’t always a bad thing.

References

  • Anda M, Ohtsubo Y, Okubo T, Sugawara M, Nagata Y, Tsuda M, Minamisawa K, Mitsui H. Bacterial clade with the ribosomal RNA operon on a small plasmid rather than the chromosome. Proc Natl Acad Sci USA. 2015 Nov 17;112(46):14343–7. doi: 10.1073/pnas.1514326112. Epub 2015 Nov 3. PMID: 26534993; PMCID: PMC4655564.
  • Anda M, Yamanouchi S, Cosentino S, Sakamoto M, Ohkuma M, Takashima M, Toyoda A, Iwasaki W. Bacteria can maintain rRNA operons solely on plasmids for hundreds of millions of years. Nat Commun. 2023 Nov 14;14(1):7232. doi: 10.1038/s41467–023–42681-w. PMID: 37963895; PMCID: PMC10645730.

--

--

Kevin Blake, PhD
The Quantastic Journal

Scientist + Writer. PhD microbiology. Freelance science writer. Articles about bacteria, genomics, and evolution.