CRISPR-Cas9 in gene editing (Part 4)

Roohi Bansal
Biotechnology by TSB
13 min readMay 30, 2022

Welcome to the 4th part of the 12-part series on CRISPR-Cas system based diagnosis of COVID-19.

Though the CRISPR-Cas system’s general mechanism remains the same, the CRISPR-Cas system can be classified into six major types- Type I, II, III, IV, V, and VI. This classification is based on differences in the processing of pre-crRNA, interference step, and the requirement of different Cas proteins.

Type I CRISPR-Cas system: In the Type I CRISPR-Cas system, pre-crRNA cleavage is carried out by Cas endoribonuclease Cas6 to form mature crRNA (Fig 1). The mature crRNA then interacts with the Cas protein complex consisting of 5 Cas proteins Cse1, Cse2, Cas7, Cas5e, and Cas6e subunits. This Cas proteins-crRNA complex is called Cascade, which locates foreign genetic material. If Cascade binds to foreign DNA, its conformation changes and causes the recruitment of a Cas nuclease Cas3 during the interference step, which then degrades the target DNA.

Fig 1: Type I CRISPR-Cas system

Type II CRISPR-Cas9 system: In type I CRISPR-Cas system, Cas endoribonuclease Cas6 carries out the pre-crRNA cleavage. But in the case of type II CRISPR-Cas system, this process involves the expression of a transactivation RNA or tracrRNA. TracrRNA includes an “anti-repeat” region that hybridizes with repeat sequences in the pre-crRNA transcript. The resulting duplex is then cleaved in the repeat sequences by RNase III in a Cas9-dependent reaction. Then crRNA and tracrRNA form a complex with Cas9 protein to form a complete search complex (Fig 2). Cas9 is literally a programmable protein because it has a program defined by crRNA with a 20-letter nucleotide sequence derived from the viruses. It means, the 20-letter nucleotide sequence directs the Cas9 protein to recognize a piece of viral DNA that matches the 20 letter sequences of crRNA. If the sequence of crRNA matches the invading virus’s DNA sequence, then Cas9 cuts the viral DNA by introducing a double-stranded break. Like type 1, in the type II system, the target nucleic acid is dsDNA. The Cas9 protein has two lobes; one lobe is for target recognition. And the other lobe contains nuclease activity. The recognition lobe is essential for binding crRNA and target DNA.

On the other hand, the nuclease lobe cleaves the target DNA by generating double-stranded breaks (DSBs). Additionally, the nuclease lobe contains HNH and RuvC nuclease domains, and these domains cause double-stranded breaks in the target DNA. The HNH domain cleaves the DNA strand complementary to the crRNA guide, while the RuvC domain cleaves the non-complementary or coding strand of target DNA.

Fig 2: Type II CRISPR-Cas system

Type III CRISPR-Cas system: Like Type I systems, in the case of Type III CRISPR-Cas system, pre-crRNA cleavage is carried out by Cas endoribonuclease, Cas6 to form mature crRNA. In addition, the crRNA generated from type III CRISPR locus is composed of eight nucleotides at its 5′ end, termed as the 5′ handle. These eight nucleotides are derived from the repeat sequence of the CRISPR. Following the 5' handle is the guide sequence, which is 30–45 nucleotides and is derived from a CRISPR spacer sequence.

Like Type I systems, in Type III systems, the mature crRNA interacts with the Cas protein complex. The Type III system is further divided into two subtypes, Type III-A and Type III-B systems. The Cas proteins-crRNA complex of the Type III-A CRISPR-Cas system is called the Csm complex, containing a single crRNA and five proteins Csm2, Csm3, Csm4, Csm5, and Cas10 (also called Csm1). On the other hand, the Cas proteins-crRNA complex of Type III-B CRISPR-Cas system is called Cmr complex, containing a single crRNA and six proteins Cmr1, Cmr2, Cmr3, Cmr4, Cmr5, and Cmr6.

The mechanism of targeting invading nucleic acids by type III system is different from type- I and II systems. In Type III-A, invading DNA targeting requires directional transcription across the target sequence or protospacer sequences to produce an RNA transcript that is complementary to the crRNA. While transcription, denaturation of the target double‐stranded DNA occurs in the transcription bubble. Then during the interference step, the crRNA binds to the targeted RNA transcript; this binding leads to the activation of Cas10 and Csm3 nucleases of the Csm complex required to cleave denatured target nucleic acid and the RNA transcript (Fig 3). Cas10 cleaves the non-template or coding DNA strand, and Csm3 protein chops RNA within the protospacer region. The Csm complex also recruits Csm6 protein that degrades nonspecific transcripts in the vicinity.

Fig 3: Type III-A CRISPR-Cas system

Cas proteins-crRNA complex of Type III-B CRISPR-Cas system, also called Cmr, can cleave an ssRNA target that is complementary to the CRISPR RNA. The Cmr effector complex consists of six proteins Cmr1–6 and a crRNA. Cmr complex can cleave the target RNA at multiple sites within the region of base-pairing.

To summarize, target nucleic acid is usually dsDNA in type I, type II, and type III-A systems. In contrast, the type III-B system targets complementary single-stranded RNA or ss DNA of the invading virus.

Type IV CRISPR-Cas system: Type IV CRISPR-Cas system is a newly discovered system and is not well characterized. Type 4 systems occur within plasmids and lack the genes like Cas1 and Cas2 that encode the CRISPR-Cas mediated defense system’s first step adaptation. The type IV CRISPR–Cas system has Cas proteins like Cas7 (also called Csf2), Cas5 (also called Csf3), and a smaller version of Cas8 referred to as Csf1. Moreover, Type IV CRISPR–Cas also encodes a DinG family helicase Csf4 and a type IV-specific Cas6-like protein Csf5. A recent structural and biochemical analysis of a type-4 CRISPR–Cas system demonstrated the role of the Cas6-like protein in both the maturation of crRNAs and in the subsequent formation of a Cascade-like crRNA-guided effector complex, composed of Csf1, Csf3, Csf5, and multiple copies of Csf2. As in other CRISPR–Cas systems, the effector complex survey the cellular environment searching for matching nucleic acid targets.

Type V CRISPR-Cas12 system: Type V CRISPR-Cas system uses Cas12 enzyme. Cas12 has several critical differences from Cas9: Cas12 causes a ‘staggered’ cut in double-stranded DNA producing ends with a single-stranded overhang instead of the ‘blunt’ cut made by the Cas9 enzyme. Also, Cas12 requires only a CRISPR RNA (crRNA) for successful targeting. By contrast, Cas9 requires both crRNA and a transactivating rRNA to target the invading virus’s DNA successfully.

Type VI CRISPR-Cas13 system: Type VI CRISPR-Cas system utilizes Cas13 enzyme. Cas13 is an RNA-guided RNA endonuclease, which means that it does not cleave DNA but only single-stranded RNA. Cas13 is guided by its crRNA to the target ssRNA, and once bound, it cleaves the target RNA.

Protospacer Adjacent Motif or PAMs

We have already discussed that the CRISPR systems in bacteria were evolved to defend bacteria against viruses. After the integration of protospacers into the CRISPR locus, the CRISPR locus is transcribed to form precursor-crRNA. After maturation, crRNA forms a complex with Cas proteins to create a search complex. And if the sequence of crRNA matches with the sequence of the DNA of the invading virus, then the Cas protein cuts the viral DNA. Thus, cleavage of viral DNA destroys the invading viral genome and therefore protects the bacteria from the viral infection. But the viral DNA targeted by the search complex is the same sequence as the protospacer DNA in the CRISPR array. Protospacers are the excised segments of the invading viral DNA. So how exactly is Cas protein able to distinguish between itself and the enemy? This is where PAM comes in. The PAM stands for Protospacer Adjacent Motif. PAM is a specific sequence of nucleotides, around 2–6 bp, that follows the protospacer sequence in a viral genome. It is imperative to understand that PAM is a component of the invading virus or plasmid but is not a component of the bacterial CRISPR locus. In the interference step of CRISPR system-mediated defense, specific Cas proteins recognize and bind the PAM sequence. The PAM recognition is necessary to facilitate the double-stranded DNA target’s unwinding, thereby triggering the base pairing between the crRNA and the DNA target followed by cleavage with the Cas proteins.

On the contrary, Cas nucleases cannot successfully bind to or cleave the target DNA sequence if the PAM sequence does not follow it. For instance, in Type II CRISPR-Cas systems, Cas9 recognizes the PAM sequence -NGG- when read in the 5’-3' direction’ where N is any nucleotide followed by 2 Guanine residues. This PAM sequence must be present for the Cas9 protein to know that it’s ok to latch onto and cut the viral DNA (Fig 4). On the other hand, the spacer sequences within the CRISPR array are not followed by “GG.” It means that the Cas9 cannot bind to the CRISPR array and thereby avoids cutting the bacterium’s own genome. The PAM sequence “NGG ‘’ is associated with the Cas9 nuclease of Streptococcus pyogenes, whereas different PAM sequences are associated with the Cas9 proteins of different bacteria. For Instance, Staphylococcus aureus Cas9 recognizes “NNGRRT” PAM sequence where R is any purine may be A or G nucleotide. And Campylobacter jejuni Cas9 recognizes NNNVRYAC PAM sequence where V is A, G, or C nucleotide; and Y is T or C nucleotide.

Fig 4: Protospacer Adjacent Motif (PAM) for Cas9 nuclease of Streptococcus pyogenes

We have already discussed that the type-1 CRISPR-Cas system utilizes the Cascade complex in the interference step and recruits Cas 3 protein to degrade viral DNA. In E. coli Cascade complex consists of a mature crRNA and five Cas proteins. Five Cas proteins of the Cascade complex are Cse1, Cse2, Cas7, Cas5e, and Cas6e subunits. During the Interference step, the Cse1 subunit of the Cascade complex recognizes the PAM region of the viral DNA. It then positions Cas3 protein adjacent to the PAM to ensure cleavage of viral DNA.

Therefore, a conserved protospacer-adjacent motif or PAM present in the target viral DNA, but not the CRISPR locus, allows for distinction between foreign and host DNA.

On the other hand, no PAMs have been detected for Type III CRISPR-Cas systems. The crRNA formed in the type III system contains a 5'- handle. This handle originates from the CRISPR repeat sequence, and the 3' sequence contains the target sequence. Therefore, the discrimination between self and non-self is achieved by the presence of a 5' handle, which results in “self-inactivation,” a fundamentally different process to the PAM recognition used by Type I and II systems.

PAMs also serve an additional role in the protospacer acquisition step. While studying the CRISPR-Cas system’s mechanism, we have already discussed that Cas1 and Cas2 complex is required in capturing new protospacers from the invading viral DNA. During this acquisition process, Cas9 works with Cas1 and Cas2 to find a PAM sequence, and then Cas1 and Cas2 remove the protospacer next to it. Protospacer picking with PAMs guarantees that when the same virus infects again and Cas9 is armed with a matching crRNA, nothing will stop it from destroying the enemy DNA.

Why is the CRISPR-Cas9 system suitable for gene editing?

Over the past decade, the CRISPR-Cas9 system has become a very popular genome-editing method because it is fast, cheap, precise, and relatively easy to use. We have already discussed that CRISPR-Cas systems are categorized into six types, I, II, III, IV, V, and VI based on differences in the processing of pre-crRNA, interference steps, and the requirement of different Cas genes. But out of the six types, the type II CRISPR-Cas system also referred to as the CRISPR-Cas9 system, has generated a lot of excitement in the scientific community for genome editing purposes. This choice stems from the fact that

1. Type I, type III, and Type IV CRISPR-Cas systems require several Cas proteins for cleaving the target viral genome. For instance, the Cascade complex consists of 5 Cas proteins in type I systems, while the Csm and Cmr effector complexes of type III systems consist of around 5 Cas proteins for targeting the viral genome. On the contrary, in the type II CRISPR-Cas systems, only one Cas9 protein is required to target and cleave the viral genome.

2. Second reason which makes Cas9 a desired candidate for gene editing is that Type V Cas12 and Type VI Cas13 enzymes show trans or collateral cutting activity. It means that on finding the target, Cas12 and Cas13 enzymes’ cleavage activity is not just restricted to the target DNA or RNA; they can also cut any single-stranded non-targeted nucleic acid molecules in the vicinity. This collateral DNase and RNase activity become a disadvantage in terms of gene editing. On the contrary Cas9 protein does not show any collateral activity, and it cleaves only the target DNA. Thus, making Cas9 protein suitable for gene editing purposes.

3. Additionally, Cas9 protein, the part of the search effector complex, participates in the processing of pre-crRNAs to form mature crRNAs. On the other hand, in the type I and type III systems, Cas6 protein, which is not a part of the search effector complex, is required to process pre-crRNAs to form mature crRNAs.

Thus, due to the simplicity and the requirement of only Cas9 enzyme for the processing of crRNA and forming search effector complex, the Type II CRISPR-Cas9 system has been widely adopted as a gene-editing tool.

Components of CRISPR-Cas9 system

Now let’s discuss the components of the CRISPR-Cas9 system that makes it an excellent gene-editing tool. In type 2 CRISPR systems, we have already discussed that generation of mature crRNA requires 2 RNAs: one is pre-crRNA itself, and the second is transactivation RNA or tracrRNA. TracrRNA includes an “anti-repeat” region that hybridizes with repeat sequences in the pre-crRNA transcript. The resulting duplex is then cleaved in the repeat sequences by RNase III in a Cas9-dependent reaction. Then crRNA and tracrRNA form a complex with Cas9 proteins to create a complete search complex. The duplex of crRNA and tracrRNA guide the Cas9 protein to target and cleave the invading viral DNA based on the sequence complementarity. It means that if the sequence of crRNA matches with the sequence of the DNA of the invading virus, then Cas9 cuts the viral DNA.

Thus, the simplicity of the type II CRISPR-Cas system, with only three required components: Cas9, crRNA and trRNA, make this system easy to use for genome editing. This potential was realized by the Doudna and Charpentier labs. And the researchers discovered that this system could be simplified compared to what nature has done where there are two separate RNA molecules: crRNA and tracrRNA that provide the program for Cas9 protein. Cas9 acts as a pair of ‘molecular scissors’ that can cut the two strands of DNA at a specific location in the genome so that bits of DNA can then be added or removed.

The simplified engineered CRISPR-Cas9 system was developed by linking the two RNAs crRNA and tracrRNA to form a single guide RNA abbreviated as sgRNA or simply called guide RNA (Fig 5). The guide RNA consists of 2 regions: a program or 20 letter nucleotide sequence and a handle shaped RNA scaffold. The program or 20 letter nucleotide sequence in the guide RNA is complementary to a small sequence of target DNA of the host that is required to be edited. In other words, if the target DNA has a sequence ATGCGC when read in 5' to 3' direction, then the RNA guide sequence is complementary and reads TUCGCG in the 3’-5' direction. Because the guide sequence is made up of RNA, therefore it has a U base that stands for uracil instead of a T that stands for thymidine.

The other region in the guide RNA is the handle shaped RNA scaffold, which interacts with Cas9 protein. The guide RNA-Cas9 complex then binds to the target DNA sequence in the host genome. After the guide RNA binds at the desired position in the genome, the nuclease domains of the Cas9 enzyme introduce double-stranded breaks in the desired DNA. The HNH nuclease domain cleaves the DNA strand that is complementary and paired with the 20-letter nucleotide sequence in single guide RNA. In contrast, the RuvC nuclease domain cleaves the non-complementary or coding strand, thus cleaving the target DNA by producing double-stranded breaks in it. Therefore, we can say that the 20-letter guide sequence in the guide RNA ensures that the Cas9 enzyme cuts at the right point in the genome.

Fig 5: CRISPR-Cas9 system for genome editing

Once the DNA is cut, researchers could use the cell’s own DNA repair machinery to add or delete pieces of genetic material to make changes to the target DNA.

The beauty of the CRISPR-Cas9 gene-editing tool is the easy design of its guide RNA. The handle for binding of Cas9 protein remains the same in every gRNA, but one can change its program or 20 letter guide sequence to direct Cas9 to a different target DNA for its modification. Thus, Cas 9 is also referred to as a “programmable endonuclease” because one can program it to cut a specific DNA by providing a target sequence or predesigned 20-letter sequence in the guide RNA.

Let’s understand it more simply. Suppose a genome contains three genes: red gene, green gene, and yellow gene. And we want to target the yellow gene specifically. So, we program our guide RNA, which acts as a GPS molecule to the yellow gene, by entering the yellow gene’s GPS coordinates. Thus, this CRISPR-Cas complex moves along the DNA until it finds its right spot, i.e., yellow gene. After finding the yellow gene, the CRISPR-Cas complex binds there, and then Cas cuts the DNA by introducing double-stranded breaks.

For the discovery of the CRISPR-Cas9 gene-editing mechanism, the 2020 Nobel Prize in Chemistry has been awarded to the two scientists Jennifer Doudna and Emmanuelle Charpentier.

The next part of the series is about the importance of CRISPR-Cas12 and CRISPR-Cas13 systems in diagnosis of infectious diseases.

If you liked this article and want to know more about SARS-CoV-2 and how it can be detected with the CRISPR-Cas system, follow the below links:

For book lovers

For video lovers

https://www.udemy.com/course/2019-novel-coronavirus-2019-ncov-covid-19-disease/?referralCode=CADEC0CD4782C0BAEA2D

Happy learning!

--

--