An Introduction to Synthetic Biology

Life Is an Empty Canvas; Synthetic Biology Is the Paintbrush

Claiming control over natural systems and organisms.

Selin Filiz
Visionary Hub

--

It’s 2010.

2010, the year of the first iPad. 2010, the year the final Hunger Games book was released. 2010, the year of the BP oil spill. 2010, the year of Haiti’s 7.0 earthquake.

2010, the year the first truly synthetic organism was produced.

(Source)

Yep: artificial organisms aren’t a thing of the future anymore. They’ve already existed for over ten years! How?

Synthetic biology enables us to innovate with the gears and guts of life to create natural pathways and systems.

The emergence of synthetic biology, or “synbio,” marks the next phase in technological development. It represents the merging of human creativity with the ingenuity of nature:

  • First, humanity harnessed natural forces for survival. We learned to cook, shelter, and protect ourselves using simple machines.
  • Then, we designed systems and products to mimic and utilize natural elements. These are stoves, apartments, governments, computers, and the like.
  • Now, we’re combining the two: designing machines from nature. We’re building artificial anabolic pathways to produce starch from CO2 and embedding enzymes in plastics to biodegrade them.

Of course, there are still huge limitations to this technology. Biology, and especially synthetic biology, are emerging fields; significant discoveries are made quite frequently compared to other better-established fields like physics and chemistry. Ten years later, we still don’t have great control over organism design, and besides, there are a plethora of ethical questions and biosecurity concerns over the creation of life.

Nevertheless, synbio is a giant field. This read intends to explain the various elements of synthetic biology. Feel free to jump around — you don’t necessarily need to read sequentially to understand everything.

Table of Contents

The Basics

More specifically …

Synthetic Biology: What Is It, Exactly?

Good question! Synbio doesn’t actually have a widely accepted definition yet, since it’s a brand-new field. For now, we can define it as

The design and fabrication of biological components and systems that do not exist in the natural world and of existing biological components and systems.

It can also be defined as the designing and construction of biological systems for useful purposes, or the engineering of nature for human purposes.

Engineering + Genomics = Synthetic Biology

Synthetic biology is a combination of both engineering (the science concerned with the design, building, and use of machines) and genomics (a part of molecular biology concerned with the characteristics and mapping of genomes — think DNA).

Example: The Repressilator

Synbio is concerned with the construction of biological circuits and pathways. For example, take the repressilator. It’s a biological system consisting of at least one feedback loop between at least three genes. Each gene, when “activated,” expresses a protein that prevents the transcription of the next gene, which expresses a protein that prevents the transcription of the next gene and so forth. In one experiment back in 2000, an artificial repressilator was synthetically inserted into E. coli. The repression of each gene led to the creation of green fluorescent protein (GFP), the intended output that confirmed that the system functioned. It’s like the domino effect — except that the system continuously loops and consistently produces a product.

Gene Lacl represses gene λ cl represses gene TetR represses gene Lacl and prompts the production of GFP.

This experiment showed that genetic networks can be designed and implemented to produce human-intended outcomes like chemicals and reactions.

Basic Cellular Structure and Biomolecules

There are a lot of terms that come with understanding the functions of cells. Before getting too deep into synbio, let’s clarify a few things.

DNA and RNA (Source)

DNA

DNA, or deoxyribonucleic acid, is a double stranded molecule made of sugar phosphate backbone and complementary base pairs adenine, thymine, cytosine, and guanine. It contains all the genetic instructions needed for organisms develop and reproduce.

RNA

RNA, ribonucleic acid, is used to convey genetic information that enables the synthesis of specific proteins. It regulates the expression of genes. Unlike DNA, it is found as a single strand folded in on itself and is composed of bases adenine, uracil, cytosine, and guanine.

Base Pairs and Nucleotides

Base pairs, when bonded together, form a “rung” of the DNA ladder. In DNA, there are four nucleotides (adenine, thymine, cytosine, guanine). Adenine bonds to thymine to form a base pair and cytosine binds to guanine to form a base pair.

Nucleic Acids and Macromolecules

Nucleic acids are one of the four macromolecules found in cells relating to the storage and expression of genetic information. Deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) are two types. The other three macromolecules are carbohydrates, proteins, and lipids (fats).

Genes

Genes are sequences of nucleotides (A, C, T, G) found on sections of a chromosome that determine a person’s characteristics. They are made of strands of DNA — a gene is around 250–2000 base pairs long.

Proteins

Proteins are made of amino acids joined by peptide bonds. They provide the structural elements of a cell, bind cells into tissues, and control the activity of genes.

More on proteins later.

Enzymes

Enzymes are a type of protein and the catalysts of all metabolic reactions. They enable an organism to build up the four macromolecules and to convert them into other substances and degrade them.

Gene Expression

Gene expression is made up of transcription and translation. Transcription (the first step) involves copying a gene’s DNA to make a messenger RNA (mRNA) molecule. This mRNA copy carries the information necessary to build a polypeptide (protein). Translation translates the information encoded in the mRNA into a sequence of amino acids during the process of protein synthesis.

Chromosomes

Chromosomes are made up of a single molecule of DNA tightly wrapped around proteins called histones. Humans have 23 pairs of chromosomes.

Genomes

A genome is the complete set of genes present in an organism.

An illustration of a cell and it’s various organelles

Cell Organelles

Organs are to bodies as cell organelles are to cells. Genetic information is housed in the nucleus of a cell.

Cells

Cells are the smallest structural and functional unit of an organism. The number of cells an organism has can range from one (unicellular organisms) to trillions.

Genotype and Phenotype

A genotype is an organism’s complete set of genetic material. A phenotype is the literal representation of that material (physical traits like eye color and face shape).

Alright! With this in mind, let’s move on to the first step in any synthetic biology procedure: sequencing.

Gene and Genomic Sequencing vs Genotyping

‘Recording’ genomes is essential to building upon and modifying genetic code — after all, you can’t revise a book without having read it first.

Gene and Genomic Sequencing

Gene sequencing is the process of determining the nucleotide (A, C, T, G) sequence in a single strand of DNA, whereas genomic sequencing determines the nucleotide sequences for the entirety of the DNA in an organism.

There are three processes that can be used for DNA sequencing: Maxam-Gilbert Sequencing, Sanger Sequencing, and Next Generation Sequencing (NGS).

Maxam-Gilbert Sequencing

This process uses purified DNA directly. First, DNA is denatured into single stranded chains and labeled with phosphorous 32 on the end of the fifth carbon atom in the sugar phosphate backbone.

  • Denatured: Destroy the characteristic properties of a macromolecule by heat, acidity, or other effects that disrupt its molecular conformation.

The DNA is then split in reaction tubes using piperidine, dimethyl sulfate, and hydrazine at specific points: whenever there is a C, a C+T, a G, and a G+A. These reactions are then loaded into a high percentage polyacrylamide gel — a neutral gel with tiny pores that helps separate molecules. This process is known as electrophoresis. It differentiates fragment sizes, which can be visualized using the radioactive phosphorous tag attached previously (autoradiography = photos produced by radiation).

Maxam-Gilbert Sequencing

Starting with smaller fragments, each base is “called,” or interpreted relative to the four chemical reactions that have taken place. If a band of DNA is shown to be in both the G and G+A reaction, this means the ending nucleotide is a G. If it is only shown in the G+A reaction, the nucleotide must be an A. This same reasoning is applied to reactions C and C+T. By reading reassembling each fragment of DNA, the whole strand is sequenced.

Maxam-Gilbert Sequencing has been put out of use due to its time-consuming nature, large room for error (confirms only about 200–300 bases per few days), and the use of radioactive materials and hydrazine, which is a neurotoxin.

Sanger Sequencing

Sanger sequencing is the “gold standard” sequencing technology with a 99.99% base accuracy and is best for analyzing small numbers of genes, as it can only sequence one DNA strand at a time. It is used to assemble larger DNA fragments and, eventually, entire chromosomes. It has five steps:

  1. DNA is denatured into two single-stranded DNA pieces.
  2. A primer (short strand of DNA) that corresponds to one end of the sequence is attached. So, if your single-strand DNA piece has nucleotides ACCATCGT, the primer will have corresponding nucleotides GTTGCTAC. The primer “recreates” the original double-stranded DNA, in a way. Primers tell DNA polymerase to start synthesizing DNA.
  3. The DNA synthesis reaction initiates, extending the chain until a termination nucleotide is randomly incorporated. Termination nucleotides are like alternate versions of each of the nucleotide's A, C, T, and G that tell DNA polymerase (in this case, the synthesis reaction) to stop encoding the DNA. Each of these synthetic termination nucleotides are labeled with a distinct fluorescent dye for identification. (They’re called dideoxynucleotide triphosphates and are missing an oxygen atom. This ensures they cannot link with the next nucleotide, thereby terminating the chain).
  4. The resultant DNA fragments are denatured once more into single-strand DNA.
  5. Denatured fragments are loaded into four lanes of a gel (depending on the terminating nucleotide of the chain) and subjected to gel electrophoresis, the movement of charged particles in a gel under influence of an electric field. Since nucleic acids are negatively charged because of their phosphate backbone, they migrate to the positive electrode. The gel restricts the movement of larger molecules, thereby separating DNA by their size. From their sizes, the sequence of the DNA is determined.
Sanger Sequencing

Next Generation Sequencing (NGS)

NGS sequences millions of small fragments of DNA simultaneously, considerably speeding up genomic sequencing. While the Sanger method needed a whole decade to sequence the entire human genome, NGS can do it in a day (the Sanger method is sometimes used to double check NGS). Here, there are four steps:

  1. The DNA must be processed into a library. Using high frequency sound waves, DNA is chopped into short double-stranded fragments. Adaptor sequences, which contain molecular “barcodes,” are attached to their ends. This allows multiple DNA samples to mix together and be sequenced at the same time and is known as “pooling” or “multiplexing.” In the end, these tags are used to reassemble all the fragments into one complete strand of DNA.

You can also create

  • Paired-end libraries: Add adaptor sequences to both ends of a DNA fragment, making it possible to sequence the DNA from both ends.
  • Mate-pair libraries: Using larger DNA inserts, you can take reads from opposite orientations (ex. start from the middle and read to both ends at the same time).

2. With your newly created library, you must clonally amplify the DNA to increase its signal. Otherwise, your sample is too tiny to be read. Each DNA molecule is attached to the surface of a bead and PCR amplified to create a set of identical clones.

  • PCR: Polymerase Chain Reaction involves heating a sample so that the DNA denatures, separating into two strands, and then using Taq polymerase to build two new strands of DNA using the two original strands as templates, similar to step 2 of Sanger sequencing. This is done by a thermocycler, exponentially increasing your DNA sample.
Polymerase Chain Reaction (Source)

3. Now, you sequence your library with “sequencing by synthesis.” Synthesized bases with special colors incorporated are bound to their opposites on single-stranded DNA. The newly incorporate bases are optically detected, read according to the colors in the photograph, and removed. All the DNA is then stitched back together with the help of the adaptor sequences (tags) that were added in the beginning.

4. There are three different levels of analysis for base pairs. Primary analysis includes processing raw signals into digitized base calls. Secondary involves filtering and trimming the data in reference to a pre-established genome. Lastly, tertiary analysis interprets the results and extracts meaningful data

NGS (Source)

A last note: genomic sequencing is valuable because it identifies an organism’s unique DNA fingerprint. This helps us learn more about genetic traits, diseases, and the inner workings of bacteria and viruses, as sequenced genomes can be searched like a database with algorithms. Unfortunately, the current cost of genomic sequencing is now competing with the perceived value of genotyping.

In April of 2003, the full sequence for the human genome was completed and published as a result of the Human Genome Project. The cost of genome sequencing is also decreasing — while genome sequencing used to cost around $2.7 billion, there is now a goal for a $1,000 genome.

Genotyping

This is where companies like 23andMe and AncestryDNA come in. Genotyping is like taking a million snapshots of your genome — instead of reading the entire book, you’re just skimming through a few chapters. It detects small genetic differences between populations that lead to major changes in phenotype. While it gives you the general idea of a genome, it is much less valuable and cannot be searched when new genetic discoveries are made. Genotyping records less than 1% of a genome.

Gene Editing and Gene Engineering

It’s important to differentiate between gene editing and gene engineering. While similar, the two are different.

Gene Editing

Gene editing is when you make a tiny, controlled change in the DNA of a living organism. It’s when you snip sections of DNA and replace them with your own snippets, like changing a single word of a sentence in a book. Gene editing is behind the production of GMOs, or Genetically Modified Organisms. Although it tends to get a negative connotation, there is nothing wrong with GMOs: they are simply organisms that have been tweaked to allow for benefits like improved animal welfare, increased productivity, and less required input.

The name “CRISPR-Cas9” is often tossed around in conversations around gene editing, and it’s currently the fastest, cheapest, and most reliable technology for editing genes. It uses two molecules:

  • Cas9: an enzyme that cuts two strands of DNA at a specific location in the genome, and
  • Guide RNA (gRNA): a pre-designed RNA sequence around 20 bases long located within a longer RNA scaffold. The bases are complementary to the section of DNA intended to be edited.

The gRNA’s sequence guides Cas9 to the target sequence, and its scaffold binds to the DNA. Cas9 makes a cut across both strands of DNA. Consequently, the cell notices that it’s DNA is damaged and repairs it with the pre-designed RNA sequence, sealing in the mutation.

CRISPR-Cas9 (Source)

Previous methods used certain chemicals or radiation to cause mutation, but scientists had no way to control these experiments. There are similar (but less extreme) issues with CRISPR around off-target effects, where Cas9 cuts at unintended genes. While not yet used routinely in humans, gene editing technologies can help treat medical symptoms related to genetics.

Gene ENGINEERING

On the other hand, gene engineering is concerned with making new types of cells by directly manipulating and making an organism’s DNA. This is about writing a new chapter, maybe even an entire book in itself!

Recombinant DNA is an important factor in the business of synthesizing life, and one of the first accomplishments in the field of genetic engineering. To clone DNA inside a host cell, you need a recombinant molecule. This is made up of a fragment of DNA inserted into a DNA molecule called a vector, which can independently replicate in a host cell.

Recombinant DNA (Source)

Restriction endonucleases are enzymes that cleave DNA at specific sequences to defend against the entry of foreign DNA into a cell. Here, they are used to cleave DNA at staggered sites, creating an “overhang” that can help attach fragments and vectors through complementary base pairing.

Once the fragment and vector are matched, they are sealed together with DNA ligase. So, two different fragments of DNA (the DNA insert and the vector) prepared by the same restriction endonuclease can be joined to create a recombinant molecule.

DNA “linkers” containing restriction endonuclease sites can also be added to the end of any DNA fragment, even if their overhangs are not complementary. This means that any fragment can be ligated to any vector.

Similarly, RNA can also be cloned. A DNA copy of the RNA (called cDNA, because it is complementary) is synthesized using the enzyme reverse transcriptase. It can then be ligated to a vector DNA.

  • Side note: DNA has bases A, T, C, and G. RNA has bases A, U, C, and G. How can you make a complimentary copy when there is no U in DNA and no T in RNA? Uracil is a demethylated form of thymine, which means that it is missing one methyl group (one carbon and three hydrogens — that methyl group is often replaced with a hydrogen atom, leading to a net loss of a carbon and two hydrogens). Uracil binds to adenine in RNA and corresponds to thymine in DNA.

Recombinant DNA is used to transplant synthesized DNA into host cells, where the DNA then takes over and repurposes cell organelles for new functions. More on this in Designer Genomes and Artificial Life.

DNA = Computer Code?

Wait a minute — if DNA is the biological code that dictates all life, can’t we manipulate and write it just like computer code?

Yep!

As mentioned, the nucleic acids adenine, cytosine, thymine, and guanine make up the base pairs that make up the DNA double helix. They also code for hundreds of amino acids, which make up thousands more proteins (see the section on Protein Design!). Instead of the 1s and 0s used in binary computer code, we can use A, C, T, and G in a type of quaternary code: biological programming. With the DNA sequences that encode organisms, we could recreate, alter existing, and create novel organisms.

Presently, you can write DNA code in the same way a software engineer writes on a computer program. Then, using a DNA synthesizer or DNA from a commercial vendor, you can use precision editing tools like CRISPR-Cas9 to “run” it in already existing organisms.

Biotech companies are developing new gene therapies and are even considering the implications of code that will make changes to the human genome that can be passed down generations.

Synthetic biology is being marketed as a form of computer code because this style is so much easier for human design. ‘Life is messy,’ and biological design is inherently chaotic. However, there are risks with this ideology. In computer coding, there are always flaws — and developers are constantly fixing them through updates, even after the software has been released. This model tends to not work in biology, as flawed organisms, if released into the wild, could destroy fragile ecological balances. Pathogens with no cure might wreak havoc on life, and no containment system provides a 0% chance of risk.

DNA Synthesis

To store information in DNA, you first need to make it! DNA sequencing and DNA synthesis are the two foundational technologies driving synthetic biology. Here’s the process for the latter:

  1. Typically, you’ll first design your synbio circuit or pathway with computer design tools.
  2. Then, the DNA strands you planned are divided into smaller overlapping pieces (synthons), around 200–1500 base pairs long. This makes them easier to synthesize.
  3. Now you’ll synthesize the DNA from your set of overlapping single-stranded oligonucleotides, either yourself or through a commercial vendor.
  • Oligonucleotides: Short DNA/RNA molecules (aka oligomers) that have a range of applications in genetic testing, research, and forensics.

The resulting overlapping synthons from step 2 are assembled into larger pieces of DNA and then cloned into an expression vector, which is a plasmid or virus that can commandeer a cell to produce materials encoded by the DNA. A synthon is simply the part of a DNA molecule that is the basis for synthetic procedure. Altogether, the synthons make up a “device,” your synbio circuit. The sequence of your cloned DNA is then verified. The expression vector is inserted into a cell and ‘assayed’ or tested to determine its biochemical activity — whether your designed system is performing its function. Changes can be made depending on the results.

The process of DNA synthesis (Source)

Essentially: design, build, test, learn, and repeat.

This is why automated process and methods that shorten the development cycle and increase throughput are valued — presently, DNA synthesis is very much at a guess and check phase.

There are two main types of DNA synthesis right now: through phosphoramidite chemistry and de novo enzymatic DNA synthesis.

Phosphoramidite Chemistry

Phosphoramidite chemistry is the traditional way to synthesize DNA. Single bases are added onto a growing oligonucleotide chain, which is attached to a controlled pore glass (glass that includes tiny pores). On a large scale, individual oligonucleotides are printed on slides at a high density.

This occurs on electrodes within a silicon semi-conductor chip. The electronic activation of these individual electrodes creates a transient (impermanent) acidic environment, deprotecting the end of a DNA oligonucleotide.

  • Deprotecting: Removing a protecting group from. Once an oligonucleotide is deprotected, additional pieces can be added on.

Deprotection ensures that nucleotides are added only to the intended oligonucleotides.

However, phosphoramidite chemistry fails to deliver high quantity and quality DNA over 150 nucleotides long. In many cases, it cannot meet the needs of emerging applications in synbio and biopharmaceuticals. It also requires harsh reaction conditions and reagents for chemical analysis. This is where enzymatic DNA synthesis comes in!

Enzymatic DNA Synthesis

Found first as part of the vertebrae immune system response, terminal deoxynucleotidyl transferase (TdT) polymerases prompt the random polymerization of deoxyribonucleoside triphosphate building blocks.

… What?

  • Transferase: An enzyme that catalyzes the transfer of a particular group from one molecule to another. Here, it’s moving bases to build the DNA strands.
  • Polymerase: An enzyme that brings about the formation of a particular polymer, especially DNA or RNA.
  • Polymerization: Combine or cause to combine to form a polymer.
  • Deoxyribonucleoside triphosphate (dNTP): One of the four bases A, G, T, and C — but with a deoxyribose sugar connected to the first carbon and three phosphate groups attached to the fifth carbon of the deoxyribose.

In other words, TdT enzymes randomly combine dNTPs, creating DNA strands.

These same TdTs are used with four reversible terminator dNTPs. The reversible terminator dNTPs enable a controlled step by step extension of the initiating primer, base by base. After a dNTP is attached, a reversible terminator dNTP representing the next base is added, preventing other dNTPs from attaching. When the next appropriate dNTP comes by, the reversible terminator dNTP is reversed, turning it into a “regular” dNTP and allowing the next triphosphate to attach.

  • Primer: A short single stranded nucleic acid used in the initiation of DNA synthesis.

This method does not require a template strand and is capable of de novo, or from the beginning, synthesis.

After multiple cycles of extension, the newly synthesized single strand polydeoxynucleotide is cleaved from its solid support and isolated for use. A polydeoxynucleotide can be any polymer that is part of DNA. This polydeoxynucleotide is a fully natural and biologically active molecule, since it was synthesized using “natural” process; the bases were not added on manually through human design. Note that the molecule is only single strand — DNA is double stranded.

All in all, enzymatic synthesis eliminates the time consuming and modification inducing chemical manipulation required with the phosphoramidite method. TdT driven synthesis accelerates the assembly process, so much that the direct synthesis of entire genes is starting to become a possibility.

Enzymatic DNA synthesis relies upon the effective polymerization of biocatalysts and the use of the right reversible dNTPs. TdTs are therefore good to use as they can extend lots of primers, adding on thousands of nucleotides and also accepting a wide variety of human-modified dNTP molecules. The challenge here is having a level of control where only one nucleotide is added in per cycle. Usually, a reversible terminator dNTP is added to the third carbon OH sequence of a dNTP to control the addition, but TdT doesn’t work with modifications in the sugar portion of reversible terminator dNTPs. Scientists are working on developing new combinations of modified dNTPs and TdTs to solve this problem.

Designer Genomes and Artificial Life

Genomes can now be designed in a computer, chemically made in a lab, (see DNA Synthesis) and transplanted into recipient cells to produce new self-replicating cells that are controlled ONLY by the synthetic genome. With this, we could potentially engineer bacteria for specific purposes like producing drugs, biofuels, and other useful chemicals. Crazy!

M. mycoides (Source)

Back in 2010, a team led by J. Craig Venter created the first functioning and self-replicating cell using a synthesized genome. They decided to recreate Mycoplasma mycoides, a species of bacteria that happens to be the smallest form of reproducing life. They chose to try and replicate a small genome because, as discussed above, there are still significant challenges with manufacturing large pieces of DNA. They started with a digitized version of M. mycoides’ million base pair long genome and designed 1,078 specific cassettes of DNA 1,080 base pairs long.

  • Gene cassette: A manipulable fragment of DNA that carries or is able to express one or more genes of interest.
(Source)

They designed these cassettes so that each one overlapped its neighbors by about 80 base pairs (see recombinant DNA in the section on gene engineering!). These cassettes were made ‘on the lab bench’. Then, they initiated a three-stage process to build the entire genome using Escherichia coli and Saccharomyces cerevisiae (brewer’s yeast).

  1. By taking ten cassettes of DNA at a time, they built 110 segments 10,000 base pairs long. (Remember, since each cassette overlaps with its neighbors, corresponding DNA strands will match together and be sealed with DNA ligase). After cloning these segments in E. coli, they were sequenced for accuracy.
  2. Those 10,000 base pair long segments are then taken 10 at a time to produce eleven 100,000 base pair segments. Again, they were cloned in E. coli and sequenced for accuracy.
  3. E. coli isn’t particularly fond of DNA sequences over 100,000 letters of code, so to stitch together the remaining eleven segments brewer’s yeast was used. Through the process of homologous recombination, DNA repair enzymes used the overlapping DNA sequences of the eleven segments to link them all together.
  • Homologous recombination: a type of rearrangement of genetic material in which genetic information is exchanged between two similar molecules of nucleic acids.
Homologous recombination (Source)

The strings were then transferred back to E. coli, then back to yeast three more times. At the end, the researchers had produced a complete synthetic genome over a million base pairs long, named JCVI-Syn 1.0.

After isolating the desired chromosomes from the yeast, a few extra gene cassettes were added to the DNA so they could select for it. These cassettes would turn the cells bright blue when they were activated. The researchers also removed any and all proteins floating around in the cell to make sure the naked DNA functioned. The DNA was then transplanted into mycoplasma capricolum recipient cells. Why? DNA isn’t enough — you still need the actual structure of the cell, and organelles like cell membranes are currently too complex to make in a lab. So instead, the team used a relative of M. mycoides, M. capricolum, as a shell to host the synthetic genome.

Now inside of an actual cell, the synthetic genome instructed the cell to transcribe DNA into messenger RNA, which translates genetic instructions into new proteins. (mRNA is a normal part of protein creation — it wasn’t something new added in by the research team.) Some of the earliest proteins produced were restriction enzymes. These recognized the original chromosomes in the M. capricolum cell as foreign DNA and chewed it up, leaving only synthetic DNA in the cell. So1on, all the cells were bright blue. After two days, when the cell was sequenced later, all features of the original M. capricolum cell were gone.

Bright blue synthetic m. mycoides cells! (Source)

Essentially, by changing the “software” of an organism, you can change the species. The software builds its own hardware.

Wait — but how were viable transplants even recovered from the E. coli and yeast cells? What if they accidentally transferred E. coli and yeast cells into M. capricolum instead of the synthetic DNA?

The watermarks on this photo tell you that the image is from “iStock,” which is “by Getty Images.”

This is where genetic watermarks come in. A watermark is a faint design made on an image or piece of paper, like this

Genetic watermarks are exactly what they sound like —watermarks in the genome. In this case, genes and proteins are used to spell out words and phrases that prove the genome is synthetic. They also help identify the lab of origin, along with whatever else you’d like to include.

In the genes of Mycoplasma mycoides JCVI-Syn 1.0, Craig Venter’s team embedded a new code for writing words, sentences, and numbers in DNA. Within the DNA itself, they wrote the names of 46 authors and key contributors to the project, as well as three quotes from Richard Feynman, James Joyce, and the book American Prometheus. They also embedded a web address in the DNA so that people who cracked the DNA code could send emails to the team.

This essentially makes JCVI-Syn 1.0 the first living thing on the planet with a computer for a parent.

These messages were written using codons, which are groups of three letters (A, C, T, G) that code for amino acids. The system for the code includes all letters and forms of punctuation in the English language, and each watermark in JCVI-Syn 1.0 is well over 1,000 base pairs long. The precision needed to encode such messages is extraordinary. And it’s not just writing them in — the DNA sequences that spell out these watermarks had to be sandwiched by other DNA sequences to make sure the proteins encoded for by the watermarks weren’t actually built by the cell.

TL;DR: If you download a genome, put it into a hollowed-out cell, and convince it to stay alive with your synthetic instructions, you can create life.

The Minimal Genome

With the creation of the synthetic Mycoplasma mycoides, Craig Venter created a minimal genome for bacteria: one containing only the 473 genes necessary for life.

They started by slicing away genes from JCVI-Syn 1.0 and created two possible genomes. Both of these failed when transplanted into M. capricolum cells. The second time around, they categorized the 901 genes of Syn 1.0 into eight parts. They began removing chunks before reassembling the DNA to be inserted and transplanting it into the cells. If the cell died, they must have removed something crucial. This led to Syn 2.0, which had less genes than any independent organism and later Syn 3.0.

Syn 3.0 (Source)

Of course, Syn 3.0 isn’t a true minimal genome. All genomes are dependent on what resources are available to it and environmental factors, so you can’t build a genome without context. For example, Syn 3.0 is dependent on a stable lab environment — the medium it grows in is uniform, so it needs fewer genes to regulate temperature fluctuations and food sources.

This simple, modular, and more organized system helps us understand how nature is shaped by evolution. We don’t even know the functions of a third of all the essential genes in the minimal genome!

BioBricks

BioBricks are standardized DNA sequences with defined structures and functions that can be used to design synthetic biology circuits in different organisms. Their standard assembly process is based on cloning techniques using restriction enzymes, purification, ligation, and transformation process as discussed in previous sections. Promoters, coding sequences, inverters, ribosomal binding sites, terminators, and plasmid backbones are all examples of different parts that can be synthesized using DNA sequences from BioBricks. The Registry of Standard Biological Parts has over 200,000 documented BioBricks.

There are three levels when it comes to BioBricks. Parts are the actual building blocks that encode basic biological functions. Parts combined with human defined functions make up devices. When incorporated into cells, devices make up new biological systems.

Protein Design

The area of protein design is MASSIVE. Proteins carry out the majority of functions within a cell — catalyzing metabolic reactions, DNA replication, transporting molecules, etc — and being able to understand how they work allows us to innovate new proteins for specific functions.

Proteins — A Crash Course

Put simply, proteins are large macromolecules made of long chains of amino acids. We get amino acids from our diet — when we eat proteins, our bodies break them down into individual amino acids so that our cells can weave them back into new, more useful proteins.

There are 20 amino acids used in life:

Phenylalanine, tryptophan, lysine, methionine, threonine, isoleucine, leucine, valine, glycine, alanine, serine, cysteine, aspartic acid, glutamic acid, asparagine, glutamine, arginine, tyrosine, proline, and histidine.

(Is it important to know all these names? Not really. Still cool though.)

Amino acids have a carbon atom with a hydrogen atom attached in their middle. This carbon is called the alpha carbon. On the left side of that alpha carbon, there is a nitrogen atom attached to two hydrogen atoms, also known as an amino group. The right side of the alpha carbon has a carboxyl group: a carbon atom attached to an oxygen and a hydroxide atom. These three sections (alpha carbon, amino group, carboxyl group) are found on all amino acids in these same spots.

The alpha carbon, amino group, and carboxyl group are identical on all amino acids. Only the R group is different.

The only thing that is different for each amino acid is the part coming off the bottom on the alpha carbon, known as the R group. The R group is what gives each amino acid its own unique properties.

To build the proteins required for our bodies, our cells use a process called dehydration synthesis. When two amino acids are positioned together, the hydroxyl group (oxygen + hydrogen) of one amino acid and hydrogen atom of another attach to make a water molecule (H₂O). This water molecule can be discarded. Every time you lose a water molecule, a covalent bond is formed (where atoms share electrons). By repeating this process with each amino acid to be added, a polypeptide is formed.

The hydroxide and hydrogen don’t actually fall in love … but you get the point.
  • Polypeptide: Each amino acid individually is a peptide. A polypeptide is simply many peptides/amino acids together.

This entire process takes place inside a ribosome, a type of cell organelle. Transfer RNAs (tRNA) transport amino acids into the ribosome, where they are attached via dehydration synthesis to create polypeptides. These polypeptides later fold into proteins. Conceptually, the process of translation ‘expands’ the 2D genetic code of DNA and RNA into the 3D code of amino acids in proteins.

  • Translation: converting the information in nucleotides into amino acids.

Each amino acid has different chemical properties. Some are hydrophilic, some are hydrophobic, some have negative charges, some have positive charges, etc. The interactions between these characteristics cause a 2D polypeptide to fold over into a 3D structure — for example, negative and positive charges attract, so a polypeptide will fold in order to place those two charges next to each other. Once a polypeptide has folded into a specific shape, it is called a protein.

A protein! Source

Remember, the alpha carbon, amino group, and carboxyl group are the same for all amino acids— they make up the backbone of the protein. It is only the R groups that fold over with each other — they comprise the residue groups that fold off the end of the protein.

The yellow-brown strand is the backbone, made up of the alpha carbons, amino groups, and carboxyl groups of the amino acids.

Proteins have four levels of structure:

(Source)
  1. Primary: This is the order that amino acids are bonded together in the polypeptide.
  2. Secondary: These are alpha helixes and beta pleated sheets, two types of structure that can be achieved by the polypeptide. Hydrogen bonds hold adjacent sides of the polypeptide together.
  3. Tertiary: Here is the stage where R groups interact to form the protein. Hydrophobic R groups tend to fold inward while hydrophilic face outwards to interact with the water in the cytoplasm of the cell.
  4. Quaternary: Proteins are able to join together, creating subunits.

A protein’s function is dependent on its structure. A 3D structure lets a protein connect with reactive sites on other proteins and molecules, allowing it to do its job. When you denature (heat up, cool down, change the acidity of) a protein, it folds apart and ceases to function, even though all the amino acids are still there. Likewise, the way a protein folds is determined by the specific sequence of amino acid — just because you know the genetic recipe of a protein does not mean you know it’s shape. If you denature a protein and then restore its natural cellular conditions, the polypeptide will refold to its native state.

What shape do they fold into? There are hundreds of possible configurations. The native conformation (shape) of a protein occurs because that particular shape is thermodynamically the most stable. In other words, proteins want to achieve the lowest energy state possible when folding. This is why R groups will fold and pack themselves very tightly — by doing this, they minimize the overall energy a protein molecule has.

Here are a few examples showing how a protein’s structure provides its function.

  • Antibody proteins are Y shaped, providing them with unique hooks to tag and dispose of diseases.
  • Collagen proteins are shaped like cords so they can transmit tension between tissues like cartilage, ligaments, bones, and skin.
  • Cas9 acts like a pair of scissors to cut and paste sections of DNA.
  • Antifreeze proteins’ 3D structure allows them to bind to ice crystals and prevent organisms from freezing.
  • Ribosomal proteins act as assembly lines to build other proteins.

How do you design proteins? By writing the DNA sequences that code for the necessary amino acids! But before you do that, there’s a slight problem …

The Protein Folding Problem

As discussed earlier, protein folding originates from the interactions between amino acids. If we could determine the strength of those interactions, we might be able to calculate how any amino acid sequence would take its final shape, allowing us to design novel proteins that can catalyze specific chemical reactions or act as medicines or materials.

However, the average protein has about 300 amino acids chains, each of which could utilize any of the 20 amino acids. There are also a lot of minimal energy configurations for a single protein sequence. Altogether, there are a LOT of different combinations a polypeptide can form when folding into a protein —around 3¹⁹⁸ — but proteins are able to find and fold into their native state in a matter of milliseconds.

The concept of how proteins explore this huge conformational space to find their native state is known as Levinthal’s paradox. It proposes that

Proteins fold rapidly because their neighboring amino acids interact locally, limiting the conformational space the protein must explore and forcing it to follow a funnel like energy landscape that makes it fold into the most stable configurations possible.

If a protein were to try and find its native conformation by sequentially testing every possible conformation, it would require a time longer than the age of the known universe to achieve the correct shape. That’s over 14,000,000,000 years. If it would take a protein that long, how on earth can humans calculate the correct folding configuration for a protein?

This is known as the protein folding problem.

There are two ways to determine the structure of a protein experimentally — through X-ray crystallography and nuclear magnetic resonance spectroscopy. However, these two methods take time, resources, cost a lot, and therefore contribute a limited number of amino acid sequences to data banks.

(Source)

Lately, using homology models (a type of computer model) have become popular. These compare the amino acid sequence of a desired protein with the sequence of an already known protein, a template, and adjust their prediction of the protein’s shape based on differences in the amino acid sequences. This template would have a similar amino acid sequence and a known folding conformation. However, there are currently not enough proteins with known structures to provide the required templates for this process.

Now what? Turns out there’s another pattern we can utilize. When comparing the DNA of similar proteins from different organisms, it is observed that certain amino acids evolve together, suggesting that they are neighbors in the folded configuration of a protein. These relationships can be used as constraints for computer models, narrowing down the space of possible protein configurations.

(Source)

David Baker used this to his advantage. With Kim Simons, Baker created a folding program called Rosetta, which scans a target protein for short amino acid sequences that typically fold in known patterns and uses that information to predict a molecule’s 3D configuration. To feed the large amount of computing power necessary for the program, they created an extension called Rosetta@home and a video game called Foldit. This allows users to contribute their computer power and protein-folding skills to guide Rosetta’s research.

By solving the protein folding problem, we might

  • Develop a protein that binds and prevents hemagglutinin, a protein of the flu virus, from invading cells.
  • Design a protein that chops up gluten, aiding people with Celiac disease and gluten sensitivity.
  • Design proteins shaped as cages that could transport drugs, therapeutic snippets of DNA/RNA, or function as nano-lanterns that aid in research.
  • Design protein sensors inside of cells to improve CRISPR, switch on the expression of specific genes, alert immune cells to invaders and cancer, and more!

Misfolding

Usually, a protein will fold correctly. Usually.

As mentioned, there are a whole bunch of different configurations a protein could take to achieve the lowest energy level possible. Because of this, some proteins misfold in ways that are energetically the same.

Molecular chaperones are specialized proteins that supervise proteins as they fold, preventing inappropriate reactions. They also help complicated or unstable proteins fold into the correct configurations. There are also quality control mechanisms in cells that can tag and send toxic proteins to the cell’s cytoplasm, where it is degraded. The existence of these chaperones implies that some proteins are inherently unstable. These can easily flip between a functional minimal energy state to one that is nonfunctional and even toxic. The genome codes for inherently unstable proteins, and random events can also cause proteins to misfold during production.

Misfolded proteins are typically insoluble, leading them to form long linear aggregates known as amyloid deposits. When a protein becomes toxic, a conformational change occurs where an alpha helix transforms into a beta sheet. This is characteristic of amyloid deposits, and exposes hydrophobic amino acids, promoting protein aggregation.

An amyloid deposit (Source)

Toxic proteins can also interact with native copies of their same protein, literally infecting them. The newly made toxic proteins repeat this cycle, amplifying their toxicity to the point where they impair the function of or kill the cell. These proteins are appropriately named infective conformations, or prions.

The accumulation of misfolded proteins as we age can cause amyloid diseases, Alzheimer’s (which affects about 10% of the adult population), Parkinson’s, and Huntington’s, and more.

Environmental factors, like exposure to substances that affect the mitochondria, are known to increase the risk of degenerative diseases. Misfolding can also result in type 2 diabetes, inherited cataracts, himodialysis-related disorders, and short chain amyloidosis. The genes and protein products involved in these diseases, which appear in peripheral tissues, are called amyloidogenic. Amyloidogenic proteins are expressed outside of their normal context, leading to them folding into ‘sticky’ conformations with lots of beta sheets. This encourages protein aggregation.

AlphaFold

Back to the protein folding problem. Because of the reduction in the cost of gene sequencing, we now have access to a. Lot. Of. Data. This makes it ideal for deep neural networks like Google Deepmind’s AlphaFold to model a target protein’s shape from scratch without using previously solved proteins as templates — a deficiency of which hinders homology models. AlphaFold uses the distance between pairs of amino acids and the angles between the chemical bonds that connect them as functions to search a protein’s large conformational space for structures that match the network’s predictions.

(Source)

Besides just repeatedly replacing pieces of protein structure with new fragments, AlphaFold uses a generative neural network to invent new fragments. By assigning scores to protein fragments, it uses a math technique called a gradient descent to make incremental improvements towards a highly accurate protein structure. This is applied to entire protein chains to simplify the process. In this way, AlphaFold is becoming a viable solution the protein folding problem.

Biosecurity and Ethical Concerns

Before wrapping up, a quick note on biosecurity and ethics. Especially after Craig Venter’s breakthrough with synthetic life, creating life has sometimes been seen as a potential danger, bringing up concerns about bioweapons and terrorists. Designed diseases and mutant organisms often come to mind. Human modification?

Often, even basic research doesn’t lead to the expected outcome. At the same time, regulation, the intervention of government in scientific process, is hotly debated. How much weight should scientists, government, companies, and the public have in decisions regarding synbio and ethics? And how much regulation is too much?

Is it responsible to “play God?

The Haldane principle argues that researchers should make decisions on what to spend research funds on rather than politicians. But what about commercial partnerships and interests? As the funders of the research, their involvement in deciding what experiments are conducted is almost inevitable. In the midst of all these competing interests, somebody needs to take control to ensure the product is produced safely. Working in partnerships is essential, as delivering good for the whole world as an individual is not realistic.

Put very simply, there are three general levels of lab safety.

  1. Active Injury. This includes direct, in-the-moment safety procedures like using proper equipment with toxic substances.
  2. Lab Escape. When it comes to artificial organisms, containment becomes an issue. Keeping experiments isolated from the outside natural world, where they may wreak havoc, is vital.
  3. Long Term Consequences. Even before designing an experiment, it’s important to think ahead. Will what you make be helpful, important, and impactful? Who, what, and where might there be negative consequences? Will the research empower others and contribute to living without suffering? Here, it is especially important to gather stakeholder feedback early from as many groups possible since each one will have a different view.

We’ve always been messing with genetics: for hundreds of thousands of years, humans have domesticated and artificially selected other organisms to produce characteristics we value. But now, we can directly interfere on the cellular level. This is no longer just a molecular concern — this is nervousness at the organism level.

So!

There you have it: synthetic biology in a nutshell (a portion of it at least …). Engineering nature is one of the most exciting disciplines out there, with implications in practically every industry and scientific field. It unlocks superpowers that allow us to directly improve life’s genetic programming, optimizing life instead of creating gadgets to aid it. Especially as we face enormous environmental issues like plastic pollution and climate change, synthetic biology innovations will be integral to lessen the scale of our sacrifice. While the investment up front may be large, beneficial applications of synbio will yield high returns. We’ve already seen this with the plummeting cost of gene sequencing.

Imagine what the next ten years will bring!

Key Highlights

  • Synthetic biology (synbio) is the design and fabrication of biological components and systems that do not exist in the natural world and of existing biological components and systems.
  • Synbio is a combination of engineering and genomics.
  • Gene sequencing is the process of determining the nucleotide sequence in a single strand of DNA.
  • Maxam-Gilbert Sequencing sorts labeled DNA into reaction tubes. These reaction tubes are subject to electrophoresis and visualized using their radioactive tags. This method is no longer used.
  • Sanger Sequencing is highly accurate at analyzing a small number of genes. It adds primers to single strands of DNA, which initiate DNA synthesis. Dyed termination nucleotides are incorporated. These fragments are subject to gel electrophoresis. From their size, sequences are determined.
  • For Next Generation Sequencing, DNA must be processed into a library. Polymerase Chain Reaction is used to clonally amplify the DNA to increase its signal. Synthesized bases with colors are bound to complimentary bases, and the sequence is read according to these colors.
  • Genotyping is like taking a million snapshots of your genome.
  • Gene editing is when you make a tiny, controlled change in the DNA of an organism.
  • CRISPR-Cas9 cuts DNA at specific locations and provides a guide RNA for the cell to repair itself with.
  • Gene engineering is concerned with making new types of cells by directly manipulating and making an organism’s DNA.
  • Recombinant DNA is a fragment of DNA inserted into a vector. Restriction endonucleases cleave the insert and vector at overlapping points, creating overhangs that allow the two to be matched.
  • DNA’s quaternary code can be compared to binary computer code. However, it is important to note that it is much more difficult to correct flaws in biological code.
  • To create DNA, 1) design a circuit, 2) divide your DNA into overlapping pieces, 3) synthesize your DNA.
  • Phosphoramidite chemistry involves adding single bases onto a growing oligonucleotide chain.
  • Enzymatic DNA synthesis uses TdT enzymes to randomly combine dNTPS and reversible terminator dNTPs, creating DNA strands.
  • J. Craig Venter and his team created the first functioning and self-replicating cell using a synthesized genome. They recreated M. mycoides using E. coli and Saccharomyces cerevisiae.
  • DNA isn’t enough to create life — you still need cell organelles.
  • Genetic watermarks are genes and proteins used to spell out words and phrases that prove the genome is synthetic.
  • Craig Venter also developed the minimal genome, which includes only the 473 genes necessary for life.
  • BioBricks are standardized DNA sequences with defined structures and functions that can be used to design synthetic biology circuits in different organisms.
  • Proteins are large macromolecules made of long chains of amino acids. They are made up of the alpha carbon, amino group, carboxyl group, and R group. Through dehydration synthesis, amino acids are bonded together to create polypeptides, which fold into proteins.
  • Proteins are able to find and fold into their native state in a matter of milliseconds. However, it would take millions of years for us to find the correct shape for a protein through trial and error. If we could determine the strength of interactions between amino acids, we might be able to calculate a protein’s final shape. This is known as the protein folding problem.
  • Homology models like Rosetta scan target proteins for short amino acid sequences that typically fold in known patterns to predict 3D configurations.
  • Protein misfolding can create toxic proteins that can kill cells and cause disease.
  • AlphaFold is a deep neural network that models a protein’s shape from scratch using a gradient descent.
  • How much regulation should be imposed upon synthetic biology? There are three general levels of lab safety: active injury, lab escape, and long-term consequences.

Sources

Basic Cellular Structure and Biomolecules

Gene and Genomic Sequencing vs Genotyping

Gene Editing and Engineering

DNA = Computer Code?

DNA Synthesis

Designer Genomes and Artificial Life

The Minimal Genome

Protein Design

Hey! I’m Selin, a 15 y/o looking to accelerate sustainability with synthetic biology and the arts. Enjoyed the article? Your support is appreciated!
Find me on LinkedIn.

--

--

Selin Filiz
Visionary Hub

17 y/o accelerating sustainability with synthetic biology and the arts.