Gene Editing: A Utopia in the Making 🧬

Want to know more about CRISPR/Cas-9? This is the article for you!

Krish Mendapara
17 min readOct 27, 2022
Image of a DNA molecule
Image of a DNA molecule → Photo by Sangharsh Lohakare on Unsplash

Gene editing was one of the most revolutionary discoveries in human history, with the potential to save many lives and change the entire trajectory of how humans fight diseases. Let’s dive into the tools that make recent advancements in gene editing possible, and ways scientists are leveraging it to fight cancer.

You’ve probably heard buzzwords of gene editing thrown around; CRISPR, Cas-9. As a result, there is a huge misconception about what gene editing is. This article will not only go in-depth about this technology regarding its roots, discovery, and potential, but it will also offer possible usage cases of CRISPR technology in society.

Let’s First Dive into Sequencing!

Gene Sequencing is the process of understanding the nitrogenous base pairs in our strands of DNA.

But wait, what’s DNA? DNA, or deoxyribonucleic acid, is the hereditary material in humans and almost all other organisms. Nearly every cell in a person’s body has the same DNA. Essentially, change a person’s DNA and you change them. DNA is encoded using 4 nitrogenous bases, Adenine, Thymine, Guanine and Cytosine. Adenine pairs only with Thymine, while Guanine can only pair with Cytosine. This makes DNA have a double helix structure, with each strand being complementary to the next. Unique combinations of these base pairs result in a unique protein being synthesized, and proteins essentially “do everything” in the human body, from digesting our food to moving our muscles.

  • For example, ATGCG can encode for blue eyes, while ATGCC can encode for black eyes. Even subtle changes in the DNA sequence can have profound effects on the phenotypic expression (the appearance of the organism)
See the molecular structure of DNA and RNA, as well as the four nitrogenous bases in DNA and RNA. Note how in RNA, a base called uracil (U) replaces thymine (T) as the complementary nucleotide to adenine

It is also essential to under what RNA is. Ribonucleic acid (RNA), a cousin of DNA, is another type of nucleic acid (a type of molecule in the human body). It is mostly involved in protein synthesis (remember proteins are the driving force behind nearly everything that occurs in the cell). Moreover, the DNA molecules never leave the nucleus but instead use an intermediary to communicate with the rest of the cell, known as messenger RNA (mRNA). There are also other types of RNA such as rRNA, and tRNA, but these are primarily involved in protein synthesis and its regulation. Below I go a little in-depth on how all these forms of RNA are used for protein expression and the technical processes that allow for this to occur. Moreover, I talk about Genome Sequencing, and how it was a pivotal point in the birth of the gene editing revolution. Feel free to skip these sections, as it is not necessary to understand gene-editing and CRISPR as a whole. However, for the STEM geeks out there, I have included it to quench your curiosity.

Protein synthesis and transcription

It is important to understand what are proteins and their role throughout the body. Proteins are large, complex molecules that play many critical roles in the body. They do most of the work in cells and are required for the structure, function, and regulation of the body’s tissues and organs.

Proteins are made up of hundreds or thousands of smaller units called amino acids, which are attached to one another in long chains. There are 20 different types of amino acids that can be combined to make a protein. The sequence of amino acids determines each protein’s unique 3-dimensional structure and its specific function.

The ribosome has two parts: a large subunit and a small subunit. The mRNA sits in between the two subunits. a tRNA molecule recognizes a codon on the mRNA, binds to it by complementary base pairing, and adds the correct amino acid to the growing peptide chain

Now that you have a better understanding of what proteins are, let’s learn how to create these proteins:

During Protein synthesis, mRNA is read in sets of three base pairs known as codons. Each codon codes for a single amino acid. In this way, the mRNA is read and the protein product is made. This is where the other two mentioned types of RNA come into play.

rRNA and tRNA are the actual drivers behind this protein synthesis process that occurs between mRNA and ribosomes (think of ribosomes as factories that create proteins). Ribosomal RNA (rRNA) is a major constituent of ribosomes on which the mRNA binds. The rRNA ensures the proper alignment of the mRNA and the Ribosomes. Essentially, rRNA is the structure of the protein synthesis process. Next, we have transfer RNA (tRNA), which is usually 70–90 nucleotides long. It carries the correct amino acid to the protein synthesis site. It is the base pairing between the tRNA and mRNA that allows for the correct amino acid to insert itself into the polypeptide chain. The way it works is that the mRNA expresses a codon it is looking for, and it matches with tRNA’s anti-codons. Then, true to its name, tRNA “transfers” the correct amino acid chain, and this process repeats as the chain grows and becomes a protein.

Overview of transcription

Transcription is the first step in gene expression, in which information from a gene is used to construct a functional product such as a protein. The goal of transcription is to make an RNA copy of a gene’s DNA sequence. For a protein-coding gene, the RNA copy, or transcript, carries the information needed to build a polypeptide (protein or protein subunit). Transcripts need to go through some processing steps before translation into proteins.

One of the crucial tools in transcription is RNA polymerase, which uses a single-stranded DNA template to synthesize a complementary strand of RNA. To do this, it essentially matches every pair on one strand of DNA to its complement. Suppose we have a strand of ATGCATC. RNA polymerase would add TACGTAG to its complementary strand

  • What is an enzyme?: Enzymes are proteins that act as biological catalysts by accelerating chemical reactions. Effectively, they “make things happen”.

So what does RNA polymerase do? RNA polymerase builds an RNA strand in the 5' to 3' (read 5 prime to 3 prime) direction, adding each new nucleotide to the 3' end of the strand. What is 5’ and 3’. Note the image on the right, which is a depiction of a chain of nucleotides, also known as DNA. The 5 prime carbon is the 5th carbon in the carbon ring that makes up DNA. shown in the image was merely one nucleotide (a building block of DNA). A series of nucleotides join together to form DNA. Each nucleotide attaches to the next at the 5’ location, and the previous nucleotide attaches to the 3’ location. Moreover, both strands of DNA run anti-parallel, that is parallel but in opposite directions (5’ to 3’, 3’ to 5’). This is simply terminology which will make understanding the stages of transcription easier.

Carbons on the deoxyribose sugar are numbered clockwise, starting from the oxygen atom. DNA bases are “read” in the 5' to 3' directions → Source

Stages of transcription

  1. Initiation. RNA polymerase binds to a sequence of DNA called the promoter, found near the beginning of a gene. Each gene (or group of co-transcribed genes, in bacteria) has its own promoter. Once bound, RNA polymerase separates the DNA strands, providing the single-stranded template needed for transcription.
  2. Elongation. One strand of DNA, the template strand, acts as a template for RNA polymerase. As it “reads” this template one base at a time, the polymerase builds an RNA molecule out of complementary nucleotides, making a chain that grows from 5' to 3'. The RNA transcript carries the same information as the non-template (coding) strand of DNA, but it contains the base uracil (U) instead of thymine (T).
  3. Termination. Sequences called terminators signal that the RNA transcript is complete. Once they are transcribed, they cause the transcript to be released from the RNA polymerase. An example of a termination mechanism involving the formation of a hairpin in the RNA is shown below.
Diagram of a DNA strand undergoing the Initiation phase → Source
Diagram of a DNA strand undergoing the Elongation phase → Source
Diagram of a DNA strand undergoing the Termination phase → Source

Genome Sequencing

Gene sequencing stems from one of the greatest scientific feats in history. The project was a race of biological discovery led by an international group of researchers looking to comprehensively study all of the DNA (known as a genome) of humans. The entire point of sequencing is to determine the nitrogen base pairs that make up our DNA.

The two parties in this race were Celera (a private company) vs. The Human Genome Project (an open-sourced collaboration between research facilities across the globe). At the time, mapping the human genome seemed like a monumental task, considering both parties had to map out over 3 billion bases across the 23 chromosome pairs.

Each competitor used a very different approach to trying and sequencing the entire body. The Human Genome Project used the conventional way of sequencing the genome, called BAC-End Shotgun Sequencing. Essentially, the BAC-End Shotgun approach creates a physical map of the whole genome before sequencing the DNA, nearly doubling the effort but ensuring the genome map is accurate. Contrastingly, Celera used Whole Genome Shotgun Sequencing, which is an unconventional method that had a significant time save but at the expense of a larger margin of error. Below I explain a step-by-step process on how both types of sequencing work:

BAC-End Shotgun Sequencing

Constructing a physical map requires cutting the chromosomes into large pieces and figuring out the order of these big chunks of single-stranded DNA before taking a closer look and sequencing all the fragments.

  1. First, several copies of the genome are randomly cut into pieces that are ~150,000 base pairs (bp) long.
  2. Each 150,000 bp fragment is inserted into a BAC (bacterial artificial chromosome). This may also be referred to as a ‘vector’ in literature. Essentially, vectors are means for a sequence of DNA to replicate inside a bacterial cell. BACs are necessary and are ideal for physically mapping genomes since they are stable in culture and are easy to manipulate. They contain the necessary promoter, enhancer, and silencer required to mimic the natural expression of the gene of interest (i.e. the bacterial cell will express the gene by coding for specific proteins). The whole collection of BACs containing the entire human genome is called a BAC library because each BAC is like a book in a library that can be accessed and copied.
  3. These pieces are ‘fingerprinted’ to give each piece a unique identification tag that determines the order of the fragments. Fingerprinting involves cutting each BAC fragment with an enzyme (think of enzymes as scissors) and finding common sequence landmarks in overlapping fragments with another BAC. Then overlapping BACs with markers every 100,000 bp form a map of each chromosome.
  4. Each BAC is then broken randomly into ~1,500 bp pieces and placed in another artificial piece of DNA called M13. M13 is then manually sequenced
  5. DNA sequencing can occur in various methods. One of the most common methods is using sequencing by synthesis
  6. Fluorescently tagged nucleotides compete to bind to the M13 piece. Note that in a single-stranded DNA, each nucleotide can only pair up with its complementary base pair. With each addition of a complementary nucleotide, the clusters are excited by a light source and the fluorescent signal is omitted. The emission wavelength along with signal intensity determines the base cell. For example, if purple light is admitted, it can correspond to thymine being “attached”. Since thymine can only attach to adenine, we can determine that that particular nitrogenous base is adenine.
  7. Recall how the 100,000 bp had some overlap. Well these sequences are fed into a computer program called PHRAP that looks for common sequences that join two fragments together, thus removing the overlap

Whole Genome Shotgun Sequencing

The shotgun sequencing method goes straight to the job of decoding, bypassing the need for a physical map. Therefore, it is much faster. However, it runs the risk of being more prone to errors, as computers may miss a possible intersection of DNA sequences, leaving gaps in the genome sequence.

  1. Multiple copies of the genome are randomly shredded into pieces that are 2,000 base pairs (bp) long.
  2. Each 2,000 bp fragment is inserted into a plasmid, a vector used for the bp fragment to replicate in bacteria. The collections of plasmids containing 2,000 bp chunks of human DNA are known as plasmid libraries.
  3. The 2,000 bp plasmid libraries are sequenced, 500 bp at a time. It is important to note that sequencing both ends of each insert are critical for assembling the entire chromosome, which will be crucial in later steps
  4. Computer algorithms assemble the millions of sequenced fragments into a continuous stretch resembling each chromosome. This is where the double sequencing of both ends comes into place, as the computers utilize this information to “patch together” the smaller bp sequences.

In the end, both parties were able to successfully sequence the entirety of the human genome without error → therefore it was a tie! However, the real winner was humanity, as sequencing proved to be an essential component of how Gene Editing was created and how it is continuing to grow. Sequencing not only allowed scientists to understand the development of diseases but also how to treat these diseases, as well as cure previously-thought-incurable genetic mutations.

On the left, we have the method used by the Human Genome Project, and on the right is the method used by Celera Genomics → Source
On the left, we have the method used by the Human Genome Project, and on the right is the method used by Celera Genomics → Source

Gene Sequencing led to many advancements in modern-day medicine and how we perceive human anatomy. This is a direct result of developing our understanding of the human genome, which served as a catalyst for other projects. For example. By understanding the genome of humans, we better identified cancer-causing factors, something we will dive deep into later in this article

After briefly touching upon the past of gene editing, now it’s time we look to the future, and current tools being utilized in the field. CRISPR is an emerging technology that is making waves all throughout the gene-editing field

So what is CRISPR?

Before we dive into what exactly CRISPR is, we have to look at where CRISPR was derived from. CRISPR was initially discovered as a defensive mechanism used by bacteria to defend themselves from viruses also known as bacteriophages. Bacteriophages are highly lethal viruses, which can introduce their DNA into bacteria.

If the bacterium was able to successfully fend off the phage, it would then use a CRISPR array, allowing the bacteria to “remember” the viruses (or closely related ones). If the viruses attack again, the bacteria can produce the correct RNA segments from the CRISPR arrays that recognize and attach to specific regions of the viruses’ DNA.

Back to what CRISPR actually is:

CRISPR is the acronym for Clustered Regularly Interspaced Short Palindromic Repeats. Let’s break this down name down step by step and analyze what this means. Firstly, it is essential we understand that CRISPR is repeated. Considering the rest of the name, this means the short strands of DNA are continually repeated and have base pairs that form a palindrome. In essence, a CRISPR is a repeated sequence of bases at spaced intervals that stand as a marker for a gene. Almost to say: “Here is the start of gene X” and “Here is the end of gene X”. DNA-cutting proteins frequently use palindrome sequences as recognition sequences, at which they cut the DNA molecule. These sequences can be four, six or eight base pairs long, although some cutting proteins require 20 or more base pairs.

Palindrome sequence in the DNA of the bacterium Streptococcus Agalactiae. Parts of the letter sequence of one strand (green) correspond to those of the other strand (yellow) in the reverse order. However, the palindrome is not perfect. It also contains a non-palindromic sequence (white). The DNA can form hairpin structures using a broken palindrome like this one. → Source

Similarly, CRISPR is also regularly interspaced meaning that there is some space (fittingly called spacer DNA) throughout the DNA strands. This spacer DNA is home to Cas (stands for CRISPR-associated) genes, which encode Cas proteins. Cas proteins can have one of two roles, helicases or nucleases. Helicases un-wind the DNA, making it single-stranded, while the nucleases cut the strands.

Just like how our human body builds up immunity against diseases that we’ve had in the past, bacteria will use CRISPR to do the same. When introduced to a virus that inserts its DNA into the bacteria, the CRISPR DNA will add a new spacer. This new spacer encodes the necessary proteins that are responsible for destroying that specific phage virus DNA in the future. How does this work? The CRISPR DNA will use Cas1 and Cas2 spacers to cut out a certain part of the DNA of the virus injection (protospacer). When the bacteria encounter these pathogens again, Cas proteins recognize and bind to these sequences in the virus DNA. This then acts like a anti-body (a marker), which tells the bacteria’s immune system to destroy it.

This image gives over the PAM Proto-Spacer (base pair ending of GG) and the CAS genes that will be present in the CRISPR Sequence. → Source

It’s also essential we talk about PAM. Because CRISPR will be looking for the virus’s DNA sequence to attack by matching it to the DNA sequence it “captured” earlier, the CRISPR DNA can’t attack its own DNA (this will destroy the original copy of the DNA, removing the immunity). This is where PAM (Protospacer Adjacent Motif) comes in. The DNA CRISPR is trying to attack in the sequences injected into the bacteria will always end in the base pairs GG (PAM Sequence). On the other hand, the sequence in the CRISPR DNA is finalized with the base pairs GT. This protects itself against the search the CRISPR does for a specific strand of DNA.

So this is how the CRISPR system works in bacteria. Next, we’ll talk about how this was discovered and applied to humans.

The CRISPR system was discovered by Jennifer Doudna and Emmanuelle Charpentier in E-Coli cells. The discovery changed the way the scientific community looks at this sequence of DNA. When CRISPR is combined with the enzyme, Cas-9, the beauty of gene editing can be seen. This discovery was made with the bacteria, Streptococcus Pyogenes.

Because they were working with this bacteria, the CRISPR sequence they found using the Cas gene, was Cas-9 (as it was native to this CRISPR system). It was by sheer luck that Cas-9 was pretty dang good at its job, hence its popularity in CRISPR systems.

The popularity of the CRISPR-Cas system is largely due to its simplicity. As shown in the figure, the CRISPR-Cas system relies on two main components: a guide RNA (gRNA) and CRISPR-associated (Cas) nuclease.

  • The guide RNA is a specific RNA sequence that guides (true to its name) the target DNA region of interest and directs the Cas nuclease there for editing. The gRNA is made up of two parts: CRISPR RNA (crRNA), a 17–20 nucleotide sequence complementary to the target DNA, and a tracr RNA, which serves as a binding scaffold for the Cas nuclease.
Overview of how CRISPR works → Source
  • The Cas protein is directed to the specific DNA locus by a gRNA, where it makes a double-strand break. There are several versions of Cas nucleases isolated from different bacteria. The most commonly used one is the Cas9 nuclease from Streptococcus pyogenes. However there are other Cas proteins, each with a unique usage
Diagram of CRISPR and Cas9 in action. The molecule in which the actual cutting of the DNA takes place is Cas9, while the gRNA (also referred to as sgRNA) is the mechanism that actual cuts the old DNA and adds new DNA / fixes mutated DNA → Source

What about other CAS Proteins?

CAS-9 is a type of protein that can be used to isolate and remove strands of RNA and DNA. There are two other CAS proteins we will quickly look at. These prove a basis for finding more CAS genes that can help CRISPR cure a wider variety of diseases:

  • CAS-12- One of the most unique aspects of the CAS-12 protein is that it is able to cut out DNA from a single strand rather than a double strand.
  • CAS-13- What makes CAS-13 very different from any other CAS protein is that instead of focusing on DNA, it actually looks for RNA to remove and replace. This makes it useful for manipulating mRNA, which we mentioned is important for protein transcription and creating the necessary building blocks for fighting specific diseases

So once Cas cuts the DNA, how we can edit it? Well depending on the type of Cas protein used, the CRISPR system will either correct a mutated gene sequence or adding a new sequence entirely. Mutated genes are often caused by the incorrect order of nitrogenous base pairs (Adenine matched by Cytosine instead of Thymine). To do this, we would cut out one strand of DNA. Then, the DNA gets repaired using Non-homologous End Joining (NHEJ). In NHEJ, the cell will use proteins to create a DNA pair-end complex, which will join with compatible bases of the cut DNA sequence. Can often cause the insertion or deletion of bases.

Another usage of CRISPR is adding or removing entire gene sequences. For this, we have to cut both sides of the DNA strand. This is when Homology Directed Repair (HDR) is used. HDR occurs when a DNA template is inserted with gRNA or Cas-9 (or Cas-protein). The DNA template must contain the addition you want. The CRISPR system then adds this sequence to the ends of one of the cut sequences. Then the human body has mechanisms in place that go and manually ensure the other end of the gene sequence gets connected too, completing the process.

The Capabilities of Gene editing

From designer babies to curing cancer, let’s look at some of the things that are able to be accomplished with this emerging technology.

Genetic Diseases

One of the obvious things many think CRISPR can solve are genetic diseases. CRISPR can work to remove and replace certain strands of DNA that result in genetic diseases. This can either mitigate their effects of them or destroy the disease completely.

Designer Babies

Designer babies are one of the most controversial topics in the world right now. In 2018, in China, He Jiankui experimented with gene editing and created the world's first designer baby, which was immune to HIV.

Unfortunately, not everyone wants the same outcome of using gene editing on embryos. Many look at using this technology for cosmetic purposes. Millions of people use plastic surgery to enhance their looks, so why can’t they use gene editing to enhance the looks of their babies from birth, avoiding the hefty price tag with domestic surgery

Curing Cancer

Cancer has plagued humanity since the dawn of time. However, it is possible that with the discovery of CRISPR, it finally met its match. Cancer occurs as cells divide uncontrollably. This is due to several mutations to genes such as Ras and p53. CRISPR Cas-9 could change this. By using this gene-editing tool, the p53 and Ras genes could be implemented into the cells with mutated copies, thus allowing them to go through apoptosis (cell suicide).

The ethics:

We know that adding more copies of p53 and Ras genes will potentially prevent cancers. But what if having too many of that genes gives rise to another disease, like Huntington’s?

Seems counterintuitive — you’re causing harm when you were trying to take out something harmful in the first place!

Because scientists are not sure what every single gene sequence in the human genome encodes for, it’s hard to know for sure what removing one will do (its the butterfly effect!). That’s why countless experiments must occur first before they can safely be used on humans in the future.

There are countless positives to CRISPR and editing humans. On one hand, future generations could become superhumans, stronger, smarter and faster than the average human. We would be able to reduce the number of diseases we can get, and diseases such as Cancer or Dementia would no longer exist.

On the other hand, there are certain socio-economic gaps that will increase as well. Those in third-world countries wouldn’t have much access to gene editing tech, and therefore their children would be drastically different from the superhumans of first-world countries. Given enough time, both lines of humans can evolve to become different species!

And then, you have people that will misuse use the wonders of gene editing tech to gain an advantage over everything. What if we have a mad scientist working on a private army of supersoldiers already?

It all sounds scary, I know. But the more people that are educated on the vast-reaching implications of gene editing the more people can use it responsibly. Not only will people understand what CRISPR/Cas-9 holds for the future — but also how it’s the first step in the biological revolution.

TL;DR:

  • Gene editing is the process of altering DNA, which contains the instructions for life.
  • CRISPR-Cas9 is just one of many gene editing tools that work to cut DNA sequences out; it can repair and add gene sequences back in.
  • The system is based on the way bacteria defend themselves against viruses.
  • Cas9 is the protein that does the cutting, and guide RNA makes sure Cas-9 is removing the right gene sequence.
  • Gene editing tech can be used for good and for harm (from editing the genes responsible for cancer to creating an army of super soldiers)

My name is Krish, a high school student passionate about using gene-editing to create a better future. If you have any suggestions, or questions, or just want to talk, you can message me on LinkedIn or Twitter. Thank you for reading and I hope you learnt something new!

--

--