How does CRISPR work? All you need to know before designing your first experiment

Ada Choudhry
19 min readApr 2, 2023

--

Have you read about how CRISPR has revolutionized modern biology? If you’ve stuck around for the technical and ethical discussions that follow it, you would have an idea that it is a tool for gene editing. But how does it actually work? And what types of experiments can we do with it?

Well, if you have these questions, then I hope this article will provide some content to ease your curiosity, as in this article, we’ll learn in-depth about:

  1. How does CRISPR work?
  2. Knock-out Experiments (gene deletion)
  3. Knock-in Experiments (gene insertion)
  4. DNA repair pathways involved (NHEJ and HDR) in the above experiments and their types
  5. Other CRISPR methods for gene expression: Base and Prime Editing

This article leans heavy into the technical details. If you would like a primer, I would recommend reading this article and then coming back to review the mechanisms in-depth.

CRISPR and their associated protein (Cas-9) is the most effective, efficient, and accurate method of genome editing tool in all living cells. Genome editing is a type of genetic engineering in which DNA is deliberately inserted, removed, or modified in living cells. It has a wide number of applications in many areas including medicine, agriculture, and biotechnology. In agriculture, it is being used to develop more resistant crops or plants with more nutritional value. In medicine, it is used through gene therapy to knockout mutated genes or in the regulation of specific genes and is currently used in diseases such as Cancer, HIV and Sickle Cell disease. However, effective delivery systems, off-target effect, and ethical issues act as barrier for clinical applications.

But before we debate on how we can overcome these barriers, it is crucial to understand the fundamentals of the technology, to equip us to contribute fruitfully in any intellectual discussion.

Let’s dive in!

Its Origin

Source: The Scientist

CRISPR stands for Clustered Regularly Interspaced Short Palindromic Repeats and is an adaptive immunity system in bacteria and archaea. When a bacteria is infected with a virus, the Cas nuclease (nuclease is an enzyme which cleaves phosphodiester bonds of nucleic acids) cuts off a piece of viral DNA, known as protospacer. This piece of foreign DNA is stored in the bacterial genome as an immune memory. The protospacers are stored between repeated palindromic sequences, and this arrangement of spacers and palindromic repeats is what gives CRISPR its name.

But how does this cutting and pasting help the bacteria be resistant to infections?

If the bacteria who has the protospacer is infected with the same virus again, it can recognize it and use the Cas-9 nuclease to cut the piece of DNA again, thus defeating its enemy by destroying its genome.

However, scientists discovered that this mechanism could be used for recognizing, cutting and inserting genes in all types of genomes. Thus, gifting us a tool of precise gene editing!

Components

CRISPR consists of two components mainly:

  1. Guide RNA (gRNA)
  2. CRISPR-associated (Cas-9) proteins

The mechanism of CRISPR/Cas-9 genome editing contains three steps:

  1. Recognition
  2. Cleavage
  3. Repair

Recognition is done through gRNA, which has two parts: CRISPR RNA (crRNA) and Trans-activating RNA (tracrRNA). These two RNAs form a complex known as a guide RNA (gRNA). Think of the Cas9 as scissors, and the gRNA as the hand that’s directing them to cut.

crRNA has the 17–20 nucleotide sequence complementary to the target DNA, and a tracrRNA serves as a binding scaffold for the Cas nuclease.

Source: BIONEER

Wait, but what is sgRNA? And why is so popular for experiments?

sgRNA stands for Single Guide RNA and is a single RNA molecule which has crRNA and tracrRNA fused into a single molecule. It can be synthetically generated or made in vitro or in vivo from a DNA template.

Cleavage is done through the Cas nuclease. The CRISPR-associated (Cas for short) protein is a non-specific endonuclease. Non-specific nucleases are a group of enzymes that degrade both single- and double-stranded DNA and RNA without sequence specificity.

To make the edits specific, it is directed to the specific DNA locus by a gRNA, where it makes a double-strand break. There are several versions of Cas nucleases isolated from different bacteria. The most commonly used one is the Cas9 nuclease from Streptococcus pyogenes.

To cut the DNA, each Cas protein looks for a specific sequence downstream of the target DNA. This is known as protospacer adjacent motif (PAM). Think of it like a bookmark. Once Cas finds it, Cas9 checks the region upstream — if it locates the target provided by the gRNA, it will create a double-stranded break (DSB).

The significance of PAM is that it helped the CRISPR-Cas9 complex in bacteria to differentiate the infected viral DNA (which it’s going to cut) from the protospacer that was embedded in the original bacterial DNA during previous attacks, as PAM is found only in the viral genome.

The Cas9 in Streptococcus pyogenes find the PAM sequence NGG, and this is different from the sequence adjacent to the protospacer and helps the complex to differentiate self from non-self.

PAM accelerates the process of finding the target sequence. In a large coiled DNA, the Cas9 molecule bounces around until it finds PAM and only then does it check the upstream sequence.

Source: Nature

But how do scientists design a gRNA with the help of PAM?

If scientists are using Cas9 from Streptococcus pyogenes, then they find the PAM (NGG in this case) within the gene of multiple base pairs and then build a gRNA with a sequence upstream to it. As this sequence can be found in multiple sites, there are many possible ways to build a gRNA and we’ll learn in the article on designing an experiment, how to pick an efficient gRNA.

But what if your target site does not have NGG downstream? Then you can choose Cas9 from different bacterial species or different Cas proteins which have evolved to check for sequences other than NGG.

Repair is done when the Cas nuclease makes a double stranded break at the target site. It can be repaired in two ways:

  1. Non-Homologous End Joining (NHEJ)
  2. Homology Directed Repair (HDR)

NHEJ is active throughout the cell cycle and can repair within tens of minutes — an order of magnitude faster than HDR, thus NHEJ is the more popular method used to repair CRISPR-Cas9 induced breaks.

NHEJ is used in gene knockout experiments and HDR is used in knock-in experiments, and we’ll understand the process in detail when we learn about these experiments in the next section.

Source: Synthego

CRISPR Knock-out experiments

Knock-out experiments are used for deleting certain segments of a gene to make it non-functional, thus knocking the gene out. In NHEJ, the target DNA is cut and the remain ends which are not compatible are joined. It involves the direct ligation of sticky or blunt ends and often results in insertion or deletion (indels) of nucleotides. This process is error-prone and indels in the coding parts of a gene result in a frameshift mutation, rendering the gene to be non-functional. A frameshift mutation is an insertion or deletion involving a number of base pairs that is not a multiple of three, which consequently disrupts the triplet reading frame of a DNA sequence. Indel errors generated in the course of repair by NHEJ are typically small (1–10 bp) but extremely heterogeneous. There is consequently about a two-thirds chance of causing a frameshift mutation. These indels are typically the result of imperfect alignment of the DNA ends and the addition or removal of nucleotides at the site of the break.

There are several ways in which indels can be inserted during NHEJ:

  1. Microhomology-mediated end joining (MMEJ): In some cases, the broken ends of DNA have short regions of similarity, called microhomologies (1–25 base pairs), near the ends. These microhomologies can anneal or pair with each other, allowing the broken ends to come together and form a flap of single-stranded DNA. The flap is then removed and replaced by DNA polymerase, which can result in the insertion or deletion of nucleotides at the site of the break. This process can result in the insertion or deletion of nucleotides at the site of the break.
  2. Polymerase insertion: DNA polymerases can occasionally add or remove nucleotides at the site of the break, resulting in indels.
  3. Template switching: In some cases, the broken ends of the DNA may anneal to other regions of the genome or to extracellular DNA fragments, leading to the insertion or deletion of nucleotides.

In NHEJ, these are the main steps that are involved regardless of end structure:

  1. Recognition of the DSB (Double Stranded Break) : Various proteins are recruited to the site of a DSB in DNA to facilitate repair. This process is called loading and in NHEJ involves the recruitment of the Ku70/Ku80 heterodimer, which binds to the ends of the broken DNA strands and recruits other proteins to the site of the DSB, forming a stable complex.
  2. Addition of more proteins: Ku then acts as a scaffold for recruitment of a kinase (DNA-PKcs) and a two subunit DNA ligase (XRCC4-ligase IV); together with some accessory factors (PAXX, XLF), this complex holds a pair of DNA ends together, forming a paired end complex.
  3. End processing: The broken ends of the DNA are processed to prepare them for ligation. This involves the removal of any damaged or mismatched nucleotides and the addition of new nucleotides, if necessary, to create compatible overhangs for ligation.
  4. Alignment of the DNA ends: The processed DNA ends are then aligned and brought into close proximity, allowing them to be ligated together
  5. Ligation: DNA ligase IV catalyzes the formation of a phosphodiester bond between the 3' OH and 5' phosphate groups of the aligned DNA ends, resulting in the rejoining of the broken DNA strands.

Although the specifications may vary depending on the DNA section being repaired, the core steps remain the same. A factor expected to impact repair is that the Cas9 protein doesn’t immediately release from the broken end after cleavage, which may interfere with loading of Ku and normal NHEJ activity.

As a gene contains many base pairs, using multiple guide RNAs that target many regions is a popular technique which results in high gene knockout efficiency. Scientists create knockout organisms to study the impact of removing a gene from an organism, which often allows them to then learn something about that gene’s function. Gene knockouts are used in a range of research areas, including functional genomics, pathway analysis, drug discovery and screening, and disease modeling.

CRISPR Knock-in Experiments

Instead of removing a gene, knock-in experiments are used to introduce a foreign gene into a genome. In HDR repair pathway, a donor template is provided which consists of the gene to be inserted, flanked by regions of homology that match the sequences on either side of the cut. The donor template is provided along with sgRNA and Cas9.

There are three steps that are universal in all HDR pathways:

  1. The 5’ DNA end of the break is cut out by nucleases to create a ssDNA 3’ overhang. Restriction enzymes that cleave the DNA asymmetrically leave single-stranded bases. If the single-stranded bases end with a 3’ hydroxyl, the enzyme is said to leave a 3’ overhang. This will serve as both a substrate for proteins required for strand invasion and a primer for DNA repair synthesis.
Source: Research Gate

2. The ssDNA strand then displaces one strand of the homologous DNA donor and can template repair off the other strand; this results in the formation of a DNA structure referred to as the displacement loop (D loop).

Source: Research Gate

3. Once the DNA exchange occurs between the overhang and the donor template, the new DNA performs a template for developing the complementary strand, thus completing the repair process.

There are two types of HDR pathways: Conservative (ones that use a donor template) and Non-conservative (ones that don’t use a donor template). The non-conservative method is single strand annealing (SSA), an error prone mechanism that notably does not require a donor template

There are three main types of conservative HDR:

  1. Classical double-strand break repair (DSBR)
  2. Synthesis-dependent strand-annealing (SDSA) pathway
  3. Break-induced repair (BIR) pathway

In the following paragraphs, I have detailed how these mechanisms take place.

Source: Research Gate

Classical double-strand break repair (DSBR)

Classical DSBR is a highly accurate DNA repair mechanism, as it relies on a homologous template sequence to guide the repair process. However, it is also a complex and time-consuming process, and defects in this pathway can lead to genomic instability and cancer.

  1. Recognition and processing of the DSB: The DSB is first recognized by the MRN complex (Mre11-Rad50-Nbs1), which binds to the ends of the broken DNA strands and recruits other proteins involved in the repair process. The ends of the broken strands are then processed by nucleases, such as CtIP and Exo1, to create single-stranded DNA (ssDNA) overhangs.
  2. Invasion of the homologous DNA sequence: The processed DNA ends then search for a homologous DNA sequence to use as a template for repair. The donor template matches its homology arms with base pairs on either side of the break. The broken DNA strand invades the homologous DNA duplex, forming a displacement loop (D-loop) structure. In turn, the strand from the template inserts itself between the break. This crossing over of DNA strands forms a double Holliday junction, which is a DNA intermediate in which two DNA duplexes are connected by crossing DNA strands. To see an animation of formation of Holliday junctions, click here.
  3. DNA synthesis and strand extension: The invading strand then serves as a primer for DNA synthesis, which is catalyzed by the Rad51 protein complex. The newly synthesized DNA strand extends from the break towards the 5' end of the broken DNA strand, displacing the original complementary strand. This is shown through the ‘Second-end capture, fill-in synthesis, ligation’ of the diagram.
  4. Strand ligation and resolution: There can be two types of cleavages: horizontal (along the crossover DNA. In image, it is at position 1 & 2) or vertical (along the non-crossover DNA. In image, it is at position 5 & 6). This results in crossover and non-crossover products. Crossover products occur when the Holliday junctions are resolved in a way that results in the exchange of genetic information between the two DNA molecules. Non-crossover products occur when the Holliday junctions are resolved in a way that preserves the original genetic information in both DNA molecules. The frequency of crossover versus non-crossover products can vary depending on the location and nature of the DSB.

Synthesis-dependent strand-annealing (SDSA) pathway

The SDSA pathway is a relatively simple and efficient mechanism for repairing DSBs that preserves genetic information and results in minimal genetic changes. However, it is less likely to result in crossover products, which are important for genetic diversity and meiosis. To see an animation of this pathway, click here.

It can be resolved into the following steps:

  1. Recognition and processing of the double-strand break (DSB): The DSB is recognized and processed by the MRN complex and other proteins involved in the repair process. The broken DNA ends are resected by nucleases, creating a 3' single-stranded DNA (ssDNA) overhang.
  2. Coating of the ssDNA overhang: The ssDNA overhang is coated with RPA (Replication Protein A), which protects the ssDNA from degradation and facilitates the binding of Rad51, a recombinase that catalyzes the invasion of the homologous DNA duplex.
  3. Strand invasion and DNA synthesis: The invading strand, bound by Rad51, searches for a homologous DNA sequence and invades the complementary strand of the homologous template DNA duplex. DNA synthesis is initiated and extends only a short distance from the 3' end of the broken DNA strand, using the complementary strand of the homologous template DNA duplex as a template.
  4. Dissociation of the newly synthesized strand: Once the DNA synthesis is complete, the newly synthesized DNA strand on the original DNA (which was broken) dissociates from the homologous template DNA duplex.
  5. Annealing and heteroduplex formation: The newly synthesized DNA strand anneals to the complementary ssDNA overhang on the broken DNA strand, creating a short stretch of heteroduplex DNA where the two strands have different DNA sequences.
  6. Resolution of the heteroduplex region: The heteroduplex region is resolved by the action of nucleases and helicases, which can cut and separate the strands to create two separate DNA molecules. As the synthesized DNA is on the annealed strand, a DNA polymerase uses it as a template to create complementary base pairs on the remaining broken strand, resulting in two molecules, one with the original sequence and the other with a new sequence created by recombination.

Break-induced repair (BIR) pathway

BIR is characterized by the presence of only one invading strand at a DSB that can be used for repair (a one-ended DSB). This single invasive strand initiates primary synthesis from a template followed by a round of lagging strand synthesis to fill in the resulting ssDNA. A single HJ forms during this process and resolves by cleavage of the crossed strand.

Source: EMBO Press

Here are the general steps involved:

  1. DNA double-strand break: The BIR pathway is initiated by a DNA double-strand break.
  2. Resection: The 5’-ends of the broken DNA strands are resected to generate long 3’-overhangs.
  3. Strand invasion: One of the 3’-overhangs invades the homologous template DNA to form a displacement loop (D-loop). This D-loop structure is similar to the one formed during homologous recombination repair.
  4. DNA synthesis: DNA synthesis proceeds using the invading strand as a primer, and the homologous template DNA as a template. This process is called “template-assisted repair” and can result in the extension of a long DNA tract.
  5. Migration of the replication fork: The replication fork migrates along the template DNA and synthesizes the missing DNA sequence. This process can continue until the end of the template is reached.
  6. Dissolution: When the BIR repair is complete, the DNA strands are dissociated, and the newly synthesized DNA is annealed to the appropriate partner strands.
  7. Resolution: The repair junction is resolved by endonucleases and ligases to generate a stable DNA structure.

Overall, the BIR pathway is a complex process that requires several DNA repair proteins to ensure successful repair of the DNA damage.

For HDR, we have understood the mechanisms behind insertion, but what about the donor template we have to provide?

Donor Templates

Donor templates are mainly of four kinds:

  1. Plasmids
  2. Double Stranded DNA (dsDNA)
  3. Single Stranded DNA (ssDNA)
  4. Bacterial artificial chromosomes (BACs)

The size of the intended edit is the biggest determinant when selecting a type of donor.

Plasmids: Plasmids are circular DNA molecules that can be introduced into cells to serve as a template for homologous recombination. HDR efficiency with circular plasmid templates is generally low; to increase the frequency of edits, researchers have designed self-cleaving plasmids that liberate the targeting region from the vector. They can be designed to contain the desired DNA sequence changes, and are often used in gene knock-in experiments.

Single-stranded oligonucleotides (ssODNs): ssODNs are short, synthetic DNA fragments that can be designed to contain specific changes to the DNA sequence. ssDNA templates (referred to as ssODNs (oligodeoxynucleotides)) are commonly used for smaller modifications (~1–50 base pairs). Small edits require as little as 30–50 bases for each homology arm, but keep in mind these numbers may vary based on your locus of interest and experimental system.

Double-stranded DNA (dsDNA) oligonucleotides: dsDNA oligonucleotides are longer DNA fragments that can be used as donor templates for HDR. Synthesizing ssODNs longer than 200 bases can be difficult, thus dsDNA plasmids are generally preferred for large insertions such as fluorescent proteins or selection cassettes. This class of template should have homology arms between 500–1000 bps.

Bacterial artificial chromosomes (BACs): It is an engineered DNA molecule used to clone DNA sequences in bacterial cells (for example, E. coli). BACs are often used in connection with DNA sequencing. It can be used as templates for gene editing experiments. Segments of an organism’s DNA, ranging from 100,000 to about 300,000 base pairs, can be inserted into BACs. They are often used for more complex genome engineering projects, such as introducing multiple gene modifications at once.

Apart from editing (inserting or deleting) genes, CRISPR-Cas9 can also be used for regulating the expression of genes.

Let’s go to thought bubble, to revise some basic concepts:

Central Dogma of Molecular Biology states that biological information flows from DNA → RNA → Proteins which are the building blocks of the body. The process of transferring knowledge from DNA to RNA occurs through a process of transcription, resulting in the formation of messenger RNA (mRNA).

Source: Wikipedia

Gene expression is the process by which the information encoded in a gene is turned into a function. This mostly occurs via the transcription of RNA molecules that code for proteins or non-coding RNA molecules that serve other functions.

With slight modifications, CRISPR can also be used to regulate the expression of genes. This is known as CRISPR activation (CRISPRa) and CRISPR interference (CRISPRi). CRISPRa is used to increase (upregulate) the expression of a gene, while CRISPRi can reduce (downregulate) the expression of a gene. In both of these methods, CRISPR is attached to an enzymatically inactive Cas nuclease, taking away its ability to cut DNA, but it can still locate DNA through gRNA. This dead Cas nuclease is known as dCas9 and is fused with transcriptional effectors to modulate target gene expression.

CRISPR Activation (CRISPRa) Experiments

Source: Synthego

The dCas9-sgRNA complex is introduced into the target cell, where it binds to the DNA at the desired location. This binding helps to recruit additional transcriptional activator proteins to the target gene, which in turn promotes its transcription, resulting in increased expression of the gene. dCas9 fused to transcriptional activators such as VP64 and p65 can be targeted to promoter and enhancer regions, which results in higher than usual transcription of the target gene.

Promoter regions are specific sequences of DNA that are located upstream (before) the start site of a gene, and they play a crucial role in regulating gene expression.

The promoter region contains the binding sites for transcription factors, which are proteins that bind to DNA and control the transcription of genes by recruiting RNA polymerase, an enzyme that synthesizes RNA from the DNA template.

When a transcription factor binds to a promoter region, it recruits RNA polymerase to the gene, which initiates the process of transcription. The transcription factors work in combination with other proteins and co-factors to determine whether a gene is turned on or off, and the strength of the transcriptional activity.

CRISPR Interference (CRISPRi) Experiments

Source: Synthego

Instead of promoting transcriptional activation of the target gene, the dCas9-sgRNA complex interferes with the transcription process by blocking the binding of RNA polymerase.

This interference results in the inhibition of gene expression, allowing researchers to selectively silence specific genes of interest. By using different sgRNAs, the CRISPRi system can be targeted to different genes in the genome, allowing for precise control of gene expression.

Recently developed CRISPR Methods

Base editing

Base editing is a genetic engineering technology that allows for precise changes to be made to the DNA sequence of a target gene without the need for introducing double-stranded breaks in the DNA.

Base editing uses either a catalytically dead Cas9 (dCas9) or a nickase Cas9 (nCas9). dCas9 is incapable of cutting DNA, while nCas9 produces ‘nicks’, or single-stranded breaks (SSBs) in the DNA.

The Cas9 enzyme is fused to an enzyme capable of chemically modifying DNA bases, such as cytidine deaminase. This allows for a specific base in the DNA sequence to be altered without introducing double-stranded breaks or relying on endogenous DNA repair mechanisms.

Source: AddGene

Once at the target site, the enzyme component of the base editor modifies the target base, converting it to a different base that is specified by the base editor design.

This technology has several advantages over traditional gene editing methods, including a lower risk of off-target effects, fewer unwanted mutations, and the ability to correct single-point mutations associated with genetic diseases.

However, base editing can only make specific types of DNA changes, such as point mutations or conversions between specific base pairs. It cannot be used to insert or delete larger segments of DNA. It is most effective at correcting single-point mutations, but it is not capable of correcting all types of mutations, such as insertions or deletions.

Prime Editing

Prime editing involves fusing nCas9 to an engineered reverse transcriptase and a prime editing guide RNA (pegRNA). The pegRNA contains two sections: one that guides to the region of interest, and another that contains the desired substitution/s for repair after the single-stranded cut has been generated.

Once the prime editor is bound to the target site, it initiates a process that involves the creation of a nick in the DNA strand, followed by the use of the reverse transcriptase enzyme to create a new DNA sequence based on a template RNA sequence.

This process enables a variety of DNA modifications to be made, including the insertion, deletion, or substitution of specific DNA sequences. Additionally, the system can also be used to correct mutations associated with genetic diseases.

After one strand has been altered by the prime editor, the complementary strand can also be corrected — an additional gRNA and nCas9 will create a nick in the strand and it will be repaired using the previously edited strand as a template.

Source: AddGene

The main advantage of prime editing over other gene editing methods is its ability to perform precise modifications to DNA sequences without the need for double-stranded DNA breaks. This reduces the risk of unwanted off-target effects and increases the specificity and accuracy of the editing process.

While prime editing is a promising technology, it is still in the early stages of development and further research is needed to optimize its efficiency, accuracy, and safety.

Armed with the knowledge of these core concepts and methods surrounding CRISPR, I’m sure you are ready to think about designing your very own experiment!

If your curiosity is getting the better of you, stay around for the next article where I’ll design a Knock-in experiment using Benchling.

But until then, keep reading, keep learning!

Sources:

--

--