CRISPR — An Immunity System Discovered in Streptococcus Thermophilus

By Izabela Ninu

Izabela Ninu
Insights of Nature
11 min readDec 14, 2023

--

Bacteria are not sterile to viruses: that is one of the main issues of the big agro-industrial companies. Bacteria are used in many food industries, in healthy ways: you need bacteria to ferment dairy products such as cheese and yogurt.

(https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExN25yOTNnNzduN3llb2g2NWc4cGttbm85a3hjNjBwZDd0ZmdvcHNyNCZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/4nloNqEDVPIdi/giphy.gif)

This is where, in 2013, scientists discovered a way to protect bacteria (typically Streptococcus thermophilus, bacteria found in milk products, a “good” bacteria) from viruses that infect them. The company of Danisco was the first to work with bacteria in industrial operations, and as such to research new ways of protecting bacteria.

The viruses that infect bacteria are called bacteriophages. If, for example, one strain that makes cheese gets indicted with a bacteriophage, you need to retire the strain, because industrially, you can’t sell it at all. As such, you need to develop cultures that are resistant to bacteriophages.

Take, as such, a bacterium population and expose it to a new virus: the virus comes in and kills the vast majority, but there is a small subset of the population that will survive. Why? That was the new variant: they possessed characteristics that made them able to withstand the bacteriophage.

Look here at this image: a locus of S. thermophilus bacteria was infected with a virus: the virus killed almost all the population, but some small subsets (white dots) survived the attack, as they were resistant to the said virus.

This is where genetic sequencing of the variant comes in: when scientists compared the genetic strand of the variant to that of the normal bacteria, they were 99.9999% identical across the 2 million DNA letters. The only difference that was observed was right at the CRISPR array.

What is CRISPR?

CRISPR (from Clustered Regularly Interspaced Short Palindromic Repeats) is the bacterial adaptive immune system. If infected by a virus, it keeps a part of its DNA and stores it.

A CRISPR array is a combination of a CRISPR repeat and a spacer: the repeat is the normal DNA in that region, that is identical. However, the spacers are all unique.

The striking difference was that at the end of the variant that survives, there is one more repeat and a spacer.

Most importantly, the spacer sequence that gets added every time the bacteria survives an infection of a phage corresponds exactly to the sequence of DNA of the phage that is used.

As such, if you synthetically add a spacer, the bacteria will become resistant to the phage of which sequence you used. Oppositely, if you remove the spacer, the bacteria lose their resistant abilities.

But the only thing is that the CRISPR sequence is not alone: before the spacers, we find some Crispr-Associated-Genes, called CAS genes. If we modify those CAS genes, does it have an impact on the expression of CRISPR?

So we know there are 4 Cas molecules before the CRISPR array: those are cas1, cas2, cns2 and most importantly, cas9. If you inactivate the cas9 molecule, the ability to be resistant will also be lost even if the spacer is still there. As such, the spacer is not enough, but you also need those CAS genes. If you knock out Csn2, you don’t lose resistance, but you lose the ability to acquire new spacers. So the CAS genes don’t only allow you to be immune, but they also help with immunization.

To better understand the next part of this article, feel free to check this video:

The triplet code and protein synthesis

How to acquire the genetic information from the phage that is put into the spacers? The spacers are specific sequences taken from the protospacer (the equivalent sequence in the original phage, inserted then in the DNA of bacteria in the form of spacers). This is where a complex of Cas1 and Cas2 proteins interfere: they will identify a small sequence ahead of a PAM, called the protospacer, and insert it into the CRISPR DNA of the bacteria. The PAM (Protospacer Adjacent Motif), is a small sequence, 5ʹ-xGG-3ʹ (two guanines and any other nitrogenous base) that the proteins will go upstream of (i.e., to the five prime ends on the strand) and then they’ll cut out a section of bases: around about 20 to 26 bases long, to then turn that into a spacer. As such, they’ll insert the newly-acquired spacer at the 5' end of the CRISPR region and then build a new repeat region to the 5' end.

So what that means is that the bacterial CRISPR region can acquire and add any section of bacteriophage DNA that it needs to combat, later, the virus in case of a following attack. Some bacteria have been discovered to have hundreds of spaces and in other species of bacteria there may only be a couple of them: so it’s a very flexible CRISPR array, that depends only on the individual and what it has been exposed to.

But how will the spacers be transcribed and how will they be used to neutralize the incoming viruses? So, how is this the immune system of a bacteria?

From time to time RNA polymerase will transcribe the CRISPR region into an RNA molecule (but we don’t call it messenger RNA — mRNA, because it’s not going to go to a ribosome and be translated into a polypeptide) we call it pre-CRISPR RNA: it’s a single RNA molecule containing both repeat and spacer regions. That molecule is going to be processed into crRNA, which will only contain the needed bases and not the useless repeat sections.

Then, another kind of RNA called tracer RNA, so tracrRNA, which has been transcribed from another gene somewhere else in the cell and has complementary strands to the crRNA so it can bind to it, as you can see below:

So now we have this structure in the cell, made of two pieces of RNA: the crRNA and the tracrRNA: that’s a single polymer of RNA nucleotides, that we will call sgRNA, single guide RNA.

And this sgRNA will be assimilated to the enzyme that we call cas9. This cas9 molecule will be able to cut DNA at a specific sequence. In a bacteria, it will cut the sequence that it will recognize in the bacteriophage. So this molecule will store the specific DNA of the virus, and when it infects the bacteria, it will check if the sequences match to cut its DNA, therefore destroying the virus and eliminating the threat of the infection.

But… if the DNA that is stored in the CRISPR array is the same as the viral DNA that Cas9 will cut, why doesn’t the Cas9 molecule also cut the DNA of the bacteria, not only the one of the virus?
Why doesn’t it auto-destroy itself?

The answer is held again in the PAM region, the Protospacer Adjacent Motif. The PAM not only allows the cas1 and cas2 to operate but also indicates, by the same principle, where the cas9:sgRNA should cut the DNA of the bacteriophage.

The key is that the spacer sequences within the CRISPR array are not followed by two guanines -GG: instead, they contain the sequence of — GTT, as such, the “scissors” are unable to cut the bacteria’s own DNA.

Now, the bacteriophage and the cell both contain two guanines in multiple points of their DNA: the cas9 enzyme will not just jump around, searching for the supposed PAM that contains that sequence: it will, once found the specific indicator, unfold the DNA strand and compare it to the spacer RNA that it contains. If the DNA matches, it means that it is a bacteriophage, and then it will cut the viral DNA.

With this technique discovered, scientists are looking into a tool that they can engineer in order to cut any DNA sequence of their liking. How are they doing that? With this famous CRISPR-Cas9 system of editing.

As such, they need to find the specific PAM sequence ahead of the DNA that they want to cut, and then develop an RNA that matches the sequence that they need to cut.

But what if there is no specific PAM sequence near the target DNA? Fortunately, there are other PAM sequences from other organisms that can be used and even other CRISPR proteins to cut genes out that recognise different PAM patterns.

Here are some examples of PAM sequences in different bacteria, and we can leverage the system of any of them to genetically modify whatever is needed.

How is CRISPR-cas9 used in the genetic editing of other organisms than bacteria?

As you may have guessed, this new discovery allowed many potential changes in the biology of organisms: they are now able to use CRISPR-cas9 to precisely edit the genome of a species.

Furthermore, there are two approaches to doing this: we call them non-homologous end joining and homology-directed repair.

Non-Homologous End Joining (or NHEJ)

DNA can repair itself in a cell: if damage has been done, you can simply join back the two broken bits of DNA, and it occurs naturally. All cells are able to repair breaks in DNA because it ACTUALLY happens surprisingly commonly: in any typical cell, there will be somewhere between 10 and 50 double-strand breaks in the dna per day. That is a lot of breaks, and the cell needs to be able to fix it naturally. They will do that by non-homologous end-joining.

Let’s take a look into this illustration: we can see that the DNA double strand has been broken initially, it might have been an accidental break or cut by a cas9 enzyme. Either way, the repair mechanism is the same. A protein that exists naturally in the cell called Ku80/70 is very strongly attracted to the ends of a piece of DNA and so it will stick to those ends. This protein will then attract another protein called theProtein Kinase Catalytic Subunit (DNA-PKcs). They will both bind to the DNA strand and attract together some other proteins such as XRCC4 or Lig4, that will bind to the Ku80/70 and PKcs and pull together the two strands of DNA. They encircle them and bring the two ends together, where the DNA ligase 4 will operate normally to join the two branches and create another sugar-phosphate bond between the sugar of one nucleotide and the phosphate of another.

While this system sounds great, however, this fix is not flawless and often there is an addition of supplementary nucleotides or the loss of some in the process. As such, mutations are often induced, that even sometimes knock out some genes. For example, if there is a sequence that goes by UGC. and you were to induce a break after the G, and by the non-homologous end joining there is an error addition of an Adenine, so the sequence now states UGA, which is a stop codon. So, you can see why this poses an important problem for scientists, as mutations will occur, that would prevent the gene from then being transcribed and translated into a protein, in case of a stop codon, or the mutation will end up creating another amino acid, therefore creating other enzymes that the specific DNA was supposed to encode.

However, happy accidents sometimes happen, and the mutations will end up being beneficial and interesting for the scientists and the specific body that is being modified.

Homology-Directed Repair (HDR)

The second way to repair a break in DNA is homology-directed repair: compared to

non-homologous end joining this is a much less error-prone approach so it would be much more ideal for cells to do that. But cells can’t always do homology directed repair: because to do it they need to have a homologous piece of DNA to the one that has been broken.

Of course, in human cells that’s usually possible because we’ve got two of each chromosome: we always have a homologous pair of chromosomes, and as such we always have a backup copy of every length of our DNA. However, in bacteria, it’s often not the case because a bacteria only has one circular chromosome. Sometimes, the DNA replicates in preparation for binary fission so at some times of the prokaryotic life cycle, the cell possesses the ability to perform Homology Directed Repair.

So, how does this system work? By Synthesis Dependent Strand Annealing.

It’s a relatively natural process:

The first thing to do is to “trim” the DNA strand to be able to combine it with its homologous strand of DNA. Indeed, we now have 2 strands of DNA that are homologous, and the replication template is going to be the complete strand.

Another way to call synthesis-dependent strand annealing is “resection to a chi site”: what this means is that, firstly, resection will happen. Resection is the act of cutting away the DNA. An enzyme called recBCD (composed of rec B, rec C and rec D) travels along both of the 5’ ends of the nucleotide, and will cut those fragments. But it doesn’t cut it indefinitely: it will only cut until it reaches a chi site.

What is a chi site? A chi site is commonly a sequence of 5'-GCTGGTGG-3', that will indicate a pause to the enzyme. There, the enzyme recA will be attracted to the 3’ end, stick to it, and drag that end down into the homologous DNA and run along to find where the homologous section is. Then, an enzyme DNA polymerase will do what polymerase enzymes do and extend the 3’ prime end until it reaches a chi site and of course that chi site is exactly the same as the chi site that paired this strand back to that point. The strand gets drawn up and it rejoins the original piece of DNA. So now we have one strand that’s been repaired using this homologous strand as a template. Of course, DNA ligase will ligate those joints. It will make sugar-phosphate bonds between them, and give us two now complete pieces of DNA that have been repaired using the information in the other.

--

--

Izabela Ninu
Insights of Nature

TKS Innovator -🧬gene editing and 🌿plant genetics