Background image: Julia Kouzenkov

Build-a-Synthetic-Genome is the new Build-a-Bear

Genome design, assembly, and activation — the whole shebang

As a “yung” asian child, Build-a-Bear had always been a mere urban myth to me. The extent of my experience with them was craning my neck to glimpse a vague blur of the store’s cheery logo before my mom would tug me supersonically to the food court on the other side of the mall.

And, of course, I do remember the twinge of jealousy as my friends presented their custom hot pink unicorn plushies during Grade 1 Show-and-Tell.

Joke’s on you, Karen — now we’re moving on to bigger, better things.

Like de novo genome synthesis.

Now isn’t that the glowup from my childhood design-your-own-teddy dreams?

Mycoplasma laboratorium, the first step into the future

Okay, so playing God isn’t that easy. It was only in 2000 that we barely managed to sequence an entire human genome, and now we want to write our own? Sounds like some cheesy science fiction — except it’s not.

In 2008, the first ever synthetic DNA, a man-made version of Mycoplasma genitalium, was built. You can think of M. genitalium as the cheese pizza of unicellular organisms. He’s a little plain, but that’s what makes him the perfect template to build cooler things off from.

To differentiate it from the initial M. genitalium genome template, the researchers added synthetic watermarks to the DNA that encoded biological information that can’t be found in nature. (And yes, they did in fact cheekily name it Mycoplasma laboratorium.)

Later, a similar experiment with DNA modeled after Mycoplasma mycoide’s genome (a goat parasite) was done by the same group at the J. Craig Venter Institute, and after inserting it into another bacteria that had its’ DNA removed, it successfully reproduced itself billions of times.

Essentially, we managed to create organisms that are entirely man-made, borne from a computer and conceived in a lab.

What. The. Heck.

Imagine the implications of this. Even right now, the Venter Institute has already accepted deals to use their technology to build bacteria that can produce all kinds of funky stuff, like hydrogen and biofuels and living carbon dioxide gobblers that can be way more powerful than plants. One day, we might even be able to produce artificial complex multicellular organisms, and push the boundaries of what it means to be a living creature.

But you didn’t click on this article to read me frothing at the mouth about surreal extrapolations of the future for 20 minutes, so let me tell you a bit about how those madmen at the Venter Institute actually managed to pull all of this off.

A little bit of this, a little bit of that

The hardest part of building a genome isn’t actually designing it. Though we still don’t completely understand what every single nitrogenous base pair means in the grand scheme of things, we’ve gotten good enough at dumbing things down that some smart people have already come up with a Minimum Viable Genome model that shows us what the bare minimum requirement of life is.

What we struggle with right now is whole genome assembly and activation. DNA is a huge molecule, made up of billions of distinct atoms. DNA fragments have been readily made in labs for years now, but stringing up those fragments to make a complete genome is what’s difficult. The molecule is too brittle, and breaks easily because we’re handling it in a way that’s different from what it’s used to in its natural biological environment.

Lovely diagram of the genome synthesis process. You get a template, sequence it, design it, build it, stick it together, pop it, lock it, drop it — okay, I’m done. Note that this is what in vivo genome assembly looks like, because you’re putting everything together inside a living yeast cell. Image from May 2014 Nature Methods.

But since 2008, lots of things have changed. There’s dozens of genome assembly methods out there today that are each suited for different scenarios, and we can group them up in in vitro methods versus in vivo methods.

In vitro genome assembly: Through the looking-glass

In vitro means in glass in Latin. Basically, building the molecules inside a test tube. Remember, our goal here is to string together DNA fragments in a way that we can produce a complete DNA molecule without anything breaking.

One of the simpler versions of in vitro genome assembly is a method called Gibson Assembly. It uses a solution with a couple of enzymes commonly used in DNA replication naturally; they’re called exonuclease, DNA polymerase and DNA ligase, the three genomic musketeers!

When you’re replicating DNA, an enzyme called helicase unzips the double helix like a zipper so it’s forked into two individual strands.

Zzzzzzzip. Image Source

There’s the leading strand that goes in the 5'-3' direction (the numbers are the way they are because of certain molecular properties the nucleotides have) and a lagging strand that goes in the 3'-5' direction.

DNA is replicated by creating new complementary strands to each forked strand so that you end up with two complete DNA sequences rather than just one.

Now, our queen bee DNA polymerase is the one who decides how things go around here, because she’s the one with the power of DNA synthesis in her hands.

She loves the leading strand, and only ever synthesizes new nucleotides in the 5'-3' direction. She hates going in the 3'-5' direction so much that she stubbornly synthesizes nucleotides on her lagging strand in 5'-3' fragments.

The fragments are called Okazaki fragments. Image Source

But first, before the DNA polymerase starts DNA replication, she needs someone to mark her spot to know where to start on the leading strand, and of course to mark the beginnings of the lagging strand fragments. That’s where RNA primers come in. You can pick them out in the image above by the orange pylons.

As the DNA polymerase starts laying down new nucleotides from the RNA primers, she has an error detection system built in where she can sense when she lays down the incorrect base, and correct it immediately by cutting out the erroneous base thanks to her right-hand man, exonuclease.

Finally, after all the new base pairs are complete, DNA ligase, the final member of the three musketeers, seals up the gaps between the Okazaki fragments on the lagging strand.

So you can see how these three enzymes can be used in artificial DNA synthesis as well, where DNA polymerase and exonuclease lay down the new synthetic nucleotides, and DNA ligase helps seal up the gaps between the individual fragments to create a complete genome.

This is all done in a way that’s biologically “familiar” to the DNA, and not jarring or shocking the way PCR (polymerase chain reaction) can be. PCR is another form of genome assembly that requires multiple temperature changes to accommodate the different enzymes needed in the reaction. Since DNA molecules are so brittle, it’s hard for them to stay intact throughout this process.

Unfortunately, most methods of in vitro genome assembly usually cannot produce enough genetic material to be very useful in larger-scale operations, though it does avoid the biocompatibility problems that occur during in vivo assembly.

In vivo genome assembly ft. yeast!

Instead of preparing a special solution with the required enzymes like in Gibson Assembly, you can actually hijack a host’s natural homologous recombination tools by placing the synthetic DNA fragments into, for example, a yeast cell, letting nature do its work instead.

But why yeast? It’s preferred because it’s super efficient, and can assemble multiple overlapping DNA fragments at once — one study even proved it could string up 25 fragments together in just a single event! Also, because a lot of studies have been done with yeast already, the tools we use for it are a lot more developed.

Beyond just genome assembly, you can even clone entire genomes inside a yeast cell!

Yeeting this yeast. Image from May 2014 Nature Methods.

To do this, first we introduce a yeast vector into the bacterial genome that we’d like to clone. Yeast is a eukaryote, which basically means it has a lot of complex cellular machinery that doesn’t exist inside bacterial cells. So, we need to “prime” the bacterial genome with a yeast expression plasmid containing all the components that’ll allow the yeast’s DNA transformation system to do its magic on the bacterial genome.

Vectors have to contain things like a yeast-specific origin of replication (ORI) and a means of selection. Those are all just a bunch of fancy words that basically mean the yeast vector tells the yeast cell’s DNA transformation machinery where to start working, and forces it to accept the bacterial genome instead of rejecting it.

The yeast vector can be inserted in multiple ways:

  1. Directly transposing it into the bacterial cell, and then isolating the combined genome to move into the yeast
  2. Performing cell fusion between the bacterial cell and the yeast, which is exactly what it sounds like
  3. Isolating the bacterial genome from its cell and cotransforming (modifying two genes at once) it with the specialized yeast vector inside the yeast cell to recombine into a complete genome in vivo
  4. You can also do a modified version of the third method by reconstructing the genome in yeast with overlapping PCR (thermocycling, as I’ve mentioned before) products or TAR (transformation-associated recombination) products

The TAR approach is one of the most precise methods, with a success rate that’s much higher than using PCR products. With TAR, you break down the bacterial genome mechanically, and co-transform each fragment with yeast vectors that are all engineered to specifically hook onto the selected fragment.

Then, you string up the pieces until it gets bigger and bigger and you have a complete, fully assembled genome that’s been cloned in the yeast cell. In this way, you can build genomes that are a lot more substantial than what you can do with PCR, which has its limitations because of the effects of thermocycling on the delicate DNA molecules.

Because the TAR approach assembles DNA in segments, it’s perfect when you have a target genome that’s made of both natural and synthetic chunks, because you can easily divide them at those junctions to re-assemble them more naturally, with a lower likelihood of running into errors.

While in vivo genome assembly has its own advantages, it’s a super delicate and finicky approach because of biocompatibility issues between the two systems. The bacterial (or synthetic) genome cannot:

  1. Have any toxic gene expressions that can kill the yeast
  2. Be too large; while in vivo genome assembly is better for bigger DNA molecules, the bigger the genome is, the more probability there will be for toxic gene product expression and host cell weakening
  3. Have too much GC (guanine-cytosine) content, which is a pair of nitrogenous bases that are difficult to stabilize when performing genome assembly in yeast.

So, there’s definitely a lot more moving parts to take into account when doing in vivo genome assembly, but the higher guarantee of success makes the work worth it.

KAMEHAMEHAAAAA! (Read: genome activation)

What I think synthetic genomics researchers look like in the lab. GIF source

Genome activation is what ties it all together; basically, answering the question of how we get the assembled genome to really start taking the lead in a cell, replicate itself, give out instructions to produce whatever crazy chemicals you decided to write into the DNA.

Sometimes, the new genes you write into a synthetic genome can stay silent after all the hard work you did to assemble the complete DNA molecule. We need to find ways to initiate that transcription and translation process. That’s where techniques like genome transplantation and cell-free genome activation come in.

The ultimate vision in synthetic biology is to have the capacity to design and build DNA that produces a biological cell with a predictable outcome.
— Daniel Gibson

Genome transplantation, nuclear transplantation’s cooler cousin

In theory, activating DNA with genome transplantation is rather simple. You purify the chromosomes (in other words, isolating them), and you transfer them into a host cell. Then, you keep those cells in an environment that selects for characteristics that can only be expressed by the donor genome, like a cellular Hunger Games where only the fittest can survive, until eventually the host cell’s DNA gets weeded out.

In practice, though, it’s a lot harder.

Here’s just a couple of things that make this really hard:

Restriction enzymes — most prokaryotes (which includes bacteria) have a defence system against bacteriophages (the virus straight out of a horror film that basically takes over bacteria by exploding DNA bombs inside them to replicate themselves) that immediately cleaves and destroys any foreign DNA that enters its body. This is called the restriction modification system. The bacteria uses nucleases to cleave stranger DNA.

In a host cell, we’re obviously going to want to stop this from happening, so we can either methylate the donor genome (adding special tags) so that it isn’t recognized as foreign by the cell or take out the restriction modification system from the host cell altogether.

Cell wall — if there’s a cell wall in the prokaryote, we’ll have to take it out or make it weaker so the donor genomes can be slipped in more easily.

Size of the genome — as usual, the bigger the genome the more prone it’ll be to fragmentation. Since it’s a super delicate molecule, we’ll want to incase it in something called agarose, which is a special sugar made from red seaweed. This’ll help protect it from breakage!

Similar machinery — host species and donor species work better together the more similar they are. The host cell needs to have the appropriate DNA transformation tools to be able to replicate the foreign DNA properly when the time comes. Things like origin of replication, promoters, terminators, which are all things the cell needs to perform transcription and translation of DNA properly, have to be able to do their work on the donor genome. If it doesn’t match those expectations, nothing will end up happening.

Something really exciting would be to consolidate all of this knowledge so we can create an all-powerful generalized recipient cell for genome transplantation. A blank slate, or a cheese pizza, if you will, that can accommodate all kinds of wild toppings to create amazing things (I’m on team pineapple pizza, thanks very much.) Eventually, as more studies are done, we’ll definitely be able to create a standardized protocol for genome-building that’ll allow these two elements to integrate with each other seamlessly.

Genome activation don’t need no cell

Cell-free genome activation is truly the next step for synthetic biology. Creating a synthetic cell with all the matching machinery to the synthetic genome, building the Bonnie to the genomic Clyde, is something that all researchers are looking towards. That’ll definitely be the true superpower. Think about it — it’s almost like a brain building its own body!

Heck, this cell might not even actually look like a cell — we can create an artificial chemical system that fulfills all the functions and needs of a biological cell — DNA replication, mitosis, metabolytic function, etc. that’s way more optimized than the current unicellular organisms. The ultimate goal of creating a self-replicating biosystem would be fulfilled regardless.

Exciting stuff, for sure.

What is Life, exactly?

Digging our teeth into how genome design and assembly and activation work is super fascinating, but what’s also really important is to constantly contextualize what we’re doing.

From hairless apes to insanely socially organized homo sapiens, our capacity for abstract thinking and communication has brought us a super long way. The more we’ve culturally evolved, the more control we’ve taken over our environment. At first, we succumbed to it and could do nothing but fight or flight to survive; but today, we’re manipulating and shaping the world around as we see fit. Soon, we’ll be able to literally decide what life, what being a living organism, means. And that’s something so huge, so colossal, that it can be a little scary to think about.

On the other hand, though — synthetic genomics will allow us to solve a lot of really big problems we have right now when it comes to chemical manufacturing. Using bacteria to produce different compounds that require horribly polluting processes will be a huge game-changer. We’ll be able to basically obliterate all hereditary and genetic diseases. The more we do it, the cheaper it’ll get; one day these genetic therapies will just cost us a couple hundred, rather than millions of dollars.

It’s a bright future, and it’s happening now.

This article is a review of Daniel G. Gibson’s work called Programming biological systems: genome design, assembly and activation.
If you liked this, don’t forget to send me some claps, and stay updated by following my Medium blog! More info about my work at