How are the genes for transgenic organisms made? (Artificial Gene Synthesis, Part 1)

For far too long, I thought that scientists could just type up a gene sequence like it was some computer programming language…

Alyson Lang

Published in

The Quantastic Journal

11 min readJun 28, 2024

…I guess I was a bit disappointed — but also intrigued — when I discovered it was much, much more complex than that.

When people talk about gene editing, they’re usually talking about modifying animals or plants. But surprisingly, this process often begins with editing a plasmid (circular ring of DNA) in bacteria. (Learn more about how genes are inserted into plants in this blog about Agrobacterium!) As a result, many scientists are constantly editing and testing several versions of many different plasmids, each with their unique purpose of, say, introducing a herbicide-resistant trait or coloring a plant red. This means that on websites like AddGene, countless binary vectors (plasmids modified for genetic engineering) have been published by different scientists for different uses. Anyone can purchase one such binary vector and use it in their own experiments.

But what if you wanted to make something new?

If no scientist can write a gene from scratch — the language of life is far too complex — how are these plasmids made?

Well, it’s less about making them and more about finding and isolating them. All genes are sourced from nature. A new trait that is added to any organism must have already been found in a different plant or animal, then isolated and transferred into the target organism. So, let’s look at how these gene sequences are acquired by exploring the following ideas:

Using Polymerase Chain Reaction (PCR)
Locating and defining primers
Acquiring primers (phosphoramidite oligonucleotide synthesis)

Polymerase Chain Reaction

PCR is considered one of the most — if not the most — important tools in biology. It’s cherished for two main reasons: first, it enables you to “pluck out” a specific DNA sequence from a large set of genes. Second, PCR lets you create hundreds of millions of copies of that sequence in a very short amount of time — between one or two hours.

Altogether, PCR can be used to efficiently and effectively find and pick out any gene from an organism’s genome, which can then be inserted into another organism to give it the chosen transgenic trait.

To find out how, exactly, PCR works its magic, let’s walk through its three stages: denaturation, annealing, and extension (and repeat). The below diagram was really helpful to me when I was learning about PCR, so we’ll be referring back to it often.

Figure 1. Steps of PCR from mavink.com. Unwound dark blue DNA is the original strand that is being amplified; red DNA segments are primers; grey shape “Taq” is the enzyme Taq polymerase; light blue DNA are newly assembled sequences. — Fig. 1. Steps of PCR from mavink.com. Unwound dark blue DNA is the original strand that is being amplified; red DNA segments are primers; grey shape “Taq” is the enzyme Taq polymerase; light blue DNA are newly assembled sequences.

Step 1. Denaturation

As a whole, the process of PCR can be characterized by its cycle of heating and cooling, which enables different components to perform their tasks. So, the first step of PCR is to melt the DNA. The heat would be turned up to 98 °C and the DNA would sit there, soaking, for 15 seconds. During this time, the double-stranded DNA, which is floating around in a small tube of mix buffer, is forced apart by the heat and separates into two single-stranded DNA molecules. Just like how heat splits the hydrogen bonds in ice to melt it into water, heat also separates the hydrogen bonds between each base pair that bind the strands together.

Denaturation is absolutely necessary for the process because, as shown by Figure 1, PCR works by separating a double-stranded DNA molecule, building up a complementary strand for each of the resulting single-stranded pieces, then repeating the process, doubling the number of DNA strands every cycle.

Step 2. Annealing

After the DNA is separated through denaturation, the next step allows primers to situate themselves at the part of the DNA sequence you want to “pluck out” and amplify. The temperature is cooled to around 60°C, but this number can change based on the annealing temperature of the primers. These two different primers, called the forward and reverse primers, define the beginning and end of the desired sequence.

— Oh, and a quick clarification: everything that I have and will introduce during the PCR process (primers, nucleotides, taq polymerase) has been floating around in the same mix buffer that our big original piece of DNA was dropped into. Some of them just aren’t relevant…yet.

Anyway, back to primers: usually 15–30 base pairs (bp) long, primers are short single-stranded segments of DNA that complement a portion of DNA at the two ends of the desired sequence (Figure 2). (They’re also depicted in red in Fig 1). Because the primers are designed to match a specific portion of the DNA sequence, they are able to locate and bind to that spot after the large DNA strand is separated. The length and content of these primers determine their annealing temperatures, which is the temperature that produces the best results in terms of the specificity and efficiency of the primers’ attachment to the larger strands. So, it’s important that the forward and reverse primers share the same annealing temperature.

Here’s a closer look at how a pair of primers would work:

Fig 2. A closer look at the structure of primers from blogspot.com

Primers don’t actually make up the desired sequence — they just set the stage for our star player, Taq polymerase. Despite that, they are the answer to how scientists are able to isolate individual traits from a huge genome. Because primers are so short, they can be directly edited and typed up — just make sure that the primers’ sequences match the ones at the ends of the desired DNA segment, and that that sequence isn’t repeated anywhere else.

How exactly these primers are made, though, is a bit of a rabbit hole, so we’ll get into that later.

Step 3. Extension

Now that our primers have attached to and defined the two ends of the target sequence, the PCR machine, also called a thermocycler, will heat to 72°C so an enzyme called Taq polymerase can do its magic and rebuild the second strand of DNA by filling in the space between the two primers.

Before we go on, allow me to rant about Taq polymerase for a bit — it’s a fascinating enzyme. Almost no other DNA polymerase is able to survive the near-boiling temperatures that PCR requires — in fact, in the early days of PCR before Taq polymerase was put to use, scientists had to constantly replenish the enzymes to make up for the ones that denatured, a wasteful and inconvenient process. But then, in the hot springs of Yellowstone National Park where no life was believed to be possible, a bacterium named Thermus aquaticus was discovered — and it was thriving! This extremophile, called Taq for short, could be heated up to 95°C without any sign of damage. Its DNA polymerase, which was naturally adapted to the scalding hot temperatures, would go on to revolutionize PCR and molecular biology forever.

Taq polymerase is an indispensable element of PCR because it’s in charge of building duplicate DNA sequences. During the stage of extension, it would first find and latch onto a primer (forward or reverse, it doesn’t matter). Each primer, like every other DNA molecule, has a 3' end and a 5' end (these numbers originate from some biochemistry shenanigans, but can be seen denoted in Fig 2). Taq polymerase finds the 3' end of the primer — which is facing our desired sequence — and starts “reading” the base DNA sequence beyond that, moving along it letter by letter. While doing so, it collects single nucleotides (As, Gs, Cs, or Ts) from the area around it (these are some of the components that were initially added to the PCR mix) and attaches them to the base DNA strand, effectively “extending” the primer in the 3' direction.

The Taq polymerase enzyme continues to do this at the shockingly rapid pace of 2 kilobases per minute until it falls off the end of the base DNA sequence. Remember — it will never meet the primer on the other side because that primer is attached to the complementary DNA strand, which was separated during denaturation. But this is no cause to worry as the other end of the desired DNA sequence will be defined after the second (of many) PCR cycles, where that opposing primer will attach to the uncropped side, causing Taq polymerase to ignore the unneeded bits facing the primer’s 5' end.

Now, we’ve come full circle and ended back up with double-stranded DNA molecules — although there are two of them now instead of one. Next, we just have to repeat this cycle many, many times (Fig 3).

Fig 3. Visualized results of PCR cycles from ResearchGate. Blue segments are the target DNA from the template; grey represents undesired DNA; green segments are created during PCR; shorter green segments, separated by a diagonal break, represent primers.

But before we move on, let’s examine this image a bit more. I’d like to point out that yes, the unneeded parts (shown in grey) on the long DNA strand from the first cycle never get removed, so they will continue to produce DNA molecules that aren’t cut off at both ends of the desired sequence. But this is alright because, after 20–30 PCR cycles (each of which doubles the number of DNA segments), these uncropped molecules will only make up a tiny portion of the total group. Plus, methods (like gel electrophoresis) are usually used after PCR to remove DNA molecules of the wrong size.

In summary, PCR begins with a long DNA molecule that has our desired gene hidden somewhere and ends with billions of copies of just our desired sequence!

Phew! That was a lot, and now we know exactly how genes are isolated. But…the answer to our question of how genes are found is still unclear. Yes, primers locate the ends of our desired gene, but how do we find the genes in the first place, and how are primers made?

Defining Primers

Defining primers and determining their location is pretty simple — if you already know the exact sequence of the targeted gene (a whole other topic!). With sequenced DNA, apps like SnapGene can be used to easily determine the beginning and end of a protein-coding gene, which are the places where primers must be created. In SnapGene, for example, you may see something like this:

Fig 4. View of gene sequences from a plasmid in SnapGene. Image by author.

You can see that the screen is full of Gs, As, Cs, and Ts stacked on top of each other in the format of a double-stranded DNA sequence, but this is only a tiny portion of the entire binary vector. The section underlined in magenta is labeled as the pVSI StaA gene, which we’ll pretend is the gene we’re trying to isolate through PCR.

Now, to be honest — pVS1 StaA is a random gene I picked from a random plasmid, and I don’t even know what it does. What’s important here is that there are countless plasmids that have their gene names already nicely labeled. So, if we want to isolate and replicate pVS1 StaA in PCR, all we have to do now is make primers that correspond to the first and last 15–30bp of this gene:

Fig 5. pVS1 StaA gene sequence with a forward primer on the top strand. Image by author.

And there we go! Here, I’ve made a forward primer that is outlined in purple, “Primer 1”, that matches the top strand of one end of this gene. Next, I’d make the reverse primer that matches the bottom strand of the other end of this gene. Before we call it good, though, we must check that the sequences of these primers don’t match any other area of the genome and that the two primers share the same annealing temperature. Both of these tasks can be easily done in an app like SnapGene.

After that, the exact sequences of both primers would be defined, and all that’s left to do is get your hands on these primers!

Acquiring Primers

Because primers are so short, they can be manually created. But, even this process takes a lot of specialized equipment and comes with other considerations that make it inconvenient for regular scientists to do in their labs. And why would they, when they could have custom-made PCR primers delivered to their doorstep in under 24 hours for as little as $5?

Biotech companies like Twist Bioscience and IDT (Integrated DNA Technologies) have cost-efficient, high-quality, and quick systems for producing custom short DNA sequences. They can make sequences of up to about 5,000 bases, but that costs hundreds of dollars and we don’t need anything like that for PCR primers, which are only 15–30 bp long. To make sequences under 200 bp like primers, most companies use a method called phosphoramidite oligonucleotide synthesis. What a mouthful, right? Let’s unpack it.

Phosphoramidite Oligonucleotide Synthesis

Oligonucleotides are short sequences of single-stranded DNA or RNA molecules, for example, PCR primers. Phosphoramidites are nucleotides that are modified for artificial DNA synthesis. An important modification they come with are the addition of protecting groups, molecules that “cap” the binding spots of a nucleotide and prevent it from connecting to other nucleotides prematurely. At the command of a chemical signal, these protecting groups are removed and the next phosphoramidite is attached. Then, they are oxidized to strengthen and secure the bond.

The first of these phosphoramidites are attached to a solid support material like a glass or resin bead. Through the method described earlier, phosphoramidites are added one by one until the desired sequence is completed.

Fig 6. Frame of phosphoramidite synthesis animation from GenScript USA Inc. White spheres are the solid support material; red, green, blue, and orange structures each represent one type of nitrogenous base (A, C, T, G); large orange connecting pieces are the sugar/phosphate backbones of each nucleotide.

With the correct primers made and delivered, we can now use PCR to amplify the genes we want to insert into the target organism!

But…there’s still one big problem. As we discussed in my blog about gene insertion, bacteria are usually needed to deliver genes into plant tissue. They do this by injecting a binary vector, a circular ring of DNA, from within themselves into the plant tissue. These binary vectors include all the necessary genes for this insertion to occur, so, a protein-coding gene sequence isolated by PCR simply cannot be inserted into an organism all by itself. It needs to be a part of a binary vector.

So, how can we assemble DNA sequences together so that we can insert a gene into a binary vector? Let’s explore this question and learn about molecular cloning in my next blog!

As always, thanks for reading, and I hope you’ve learned something new today! If you’d like to chat, you can always reach me through my X account. See you later!