Genome engineering 2.0

How a repurposed CRISPR-Integrase fusion protein allows for large genome insertions without DNA cleavage

When it comes to biology, nature is often the inspiration for brilliant engineering. Our cells are filled with biological nanomachines (proteins or protein complexes) performing complicated and nuanced tasks; adding amino acids to growing protein chains, unwinding and replicating DNA, transporting molecules through lipid membranes. Every time we humans managed to understand, harness or repurpose these functions, our capabilities to investigate, manipulate and change our human biology dramatically increased. Often beyond what we initially thought possible.


Latest since the Nobel Prize of 2020, the CRISPR bacterial immune system and its applications for programmable genome targeting have been widely recognized. We’ve touched multiple times on the elegance and potential of the CRISPR-Cas system in the last years, for example, their application to gene editing and gene drives, even their natural counterparts, proteobacterial anti-CRISPR.

Considering the breathtaking development in the last 9 years since its discovery, the engineering of the CRISPR toolbox has been a marvel to follow.

Major application areas of CRISPR-Cas-based technologies. [Mazhar Adli, Nature communications, 2018]

Yet as with any new shiny technology, the limitations get a lot less attention in the media than the potential applications. And there were many; from specific sequence requirements (PAM) to off-target effects, double-strand breaks, DNA damage and immune toxicity, lack of editing precision, low or variable editing efficacy, size limitations of what sequence length can be inserted; the inability to edit non-dividing cells, or even just delivering the CRISPR tools to different cell types effectively; all potential hindrances for gene therapy applications.

Scientists have been chipping at it ever since, by introducing techniques such as prime editing, which avoids introducing double-strand breaks, or HITI, which can be used to facilitate large genomic insertions. And while the steady drip of improvement is expected in science, sometimes the combination of diverse efforts can yield truly noticeable jumps in progress.

CRISPR toolbox 2.0

A few weeks ago, an exciting pre-print was released by the AbuGoot lab at the McGovern Institute for Brain Research at MIT, and its brilliant engineering seems to eliminate many current limitations to genome editing applications in one sweep.

Building on the prime-editing system, including the reverse transcriptase and CRISPR-nickase, they reasoned that they could first insert a small recombination site (delivered by attB-site containing ‘atgRNA’) in the prime edited position, which can subsequently be recognized by a serine integrase. The serine integrase will then, in a second step, insert the larger genetic cargo into this attachment-containing site. People familiar with the Cre/loxP recombination system will have an intuitive understanding of how this works. In fact, the authors tried Cre/loxP first but were unsatisfied with the efficiency of the Cre recombinase. So they decided to go with the attB-sites that bacterial serine recombinases use as their targeting element.

To test the complete system, we combined all components and delivered them in a single transfection: the prime editing vector, the atgRNA, a nicking guide for stimulating repair of the other strand, a mammalian expression vector for the corresponding integrase or recombinase and a 969 bp minicircle DNA cargo encoding green fluorescent protein (GFP) — Ioannidi E. et al., 2021, bioarxiv

A tour de force. They termed their approach to genome editing PASTE (Programmable Addition via Site-specific Targeting Elements).

If this sounds abstract to you, we can use a very rough analogy:

Imagine DNA as a power cable. If you want to introduce an ‘insert’ at a specific position, usually you would have to cut the cable and re-attach the respective ends, which is pretty messy, error-prone and difficult. Now prime editing allows you to basically put another cable on top of the power cable, while nicking the bottom, getting a small patch of extra ‘string’ to become part of the whole cable. With the PASTE system, you don’t just use a small patch of cable but attach instead what amounts to a plug-and-socket module (attB site) at the specific position. You still nick the bottom cable, making the plug-socket module an intrinsic part of the cable. The plug and socket can then be detached easily and an arbitrary-length extension cable (with their own plug&socket named AttP) can be put in. This is what the serine integrase of PASTE does.

With that in mind, you can check out figure 1 below; top left is the large fusion protein consisting of SpCas9, M-MLV reverse transcriptase, and the Bxb1 integrase. This nanomachine performs all the catalytic work required. In the middle, the attB integration site is the ‘plug-socket’ addition; the to be inserted gene(top right) is the ‘extension cable’; integration leaves a ‘plug/socket’ pair both left and right of the gene (red triangles).

Figure 1. PASTE editing allows for programmable gene insertion independent of DNA repair pathways. a) Schematic of programmable gene insertion with PASTE. The PASTE system involves insertion of landing sites via Cas9-directed reverse transcriptases, followed by landing site recognition and integration of cargo via Cas9-directed integrases. b) Schematic of PASTE insertion at the ACTB locus, showing guide and target sequences. [Ioannidi et al., 2021, bioarxiv]

Take a moment to understand how remarkable this is; merging several proteins from different biological backgrounds together to insert a large genetic chunk at predefined genomic loci without cutting the DNA.

Next, they set out to optimize various parameters of their system (Figure 2). A big and often unintuitive part of optimizing gene editing with CRISPR is by finding suitable gRNAs or even rational design rules. Their approach was to perform a screening experiment where they looked at the editing efficiency of more than ten thousand atgRNAs on a small set of target loci. They then used the results to train classifier model (k-mer based multilayer perceptron) to identify optimal combinations of attB, PBS and RT sequence length.

They also made their trained model available online to the scientific community to assist their future atgRNA design needs; which is great and also has become somewhat of an unofficial tradition in the CRISPR field.

Figure 2. Evaluating design rules for efficient PASTE insertion at endogenous genomic loci. a) Schematic of pooled oligo library design for high-throughput screening of atgRNA designs at endogenous gene targets. b) Box plots depicting the editing rates of AttB addition at the different endogenous targets across 10,580 different atgRNA designs. c) Scatter plot depicting AttB site editing versus significance of the editing (–log(p-value)) as measured by a Student’s T-test against a no Cas9-RT control. d) Heatmaps depicting percent AttB site editing for LMNB1 guide 1 across different RT, PBS, and AttB lengths. e) Top atgRNA hits from the screen are compared for AttB site addition against manually designed atgRNAs (grey bars). f) PASTEv3 efficiency for insertion of an EGFP cargo at different endogenous targets is compared between screen validated atgRNAs and manually designed atgRNAs. g) Accuracy results by 5-fold cross validation of a MLP classifier trained on data from the 10,580 atgRNAs. h) PASTE integration rates of previously evaluated atgRNAs predicted by the MLP classifier to be efficient (pos. guides) or not efficient (neg. guides). i) PASTE integration rates of top atgRNAs predicted to be efficient (dark pink) or not efficient (light pink) by the MLP classifier. Solid line indicates median, dotted lines indicate 25th and 75th percentiles. j) PASTE integration rates and indel formation for integration of eight therapeutically relevant payloads at the ACTB locus.

They proceeded with showing experimental data for the efficiency of the PASTE system. What really struck out however were their results on the specificity and purity of integration compared with other currently available insertion methods.

Off-target effects are a real danger to any genetic engineering workflow where one cannot perform cellular quality control or selection of molecular clones. So virtually all somatic gene therapy approaches. Because it is based on prime-editing and nicking, rather than cutting the DNA, the PASTE system has a comparatively low chance of acting on off-target sites, especially when compared to other integration methods like HITI (Figure 3).

Figure 3. Characterization of genome-wide PASTE specificity and purity of integration compared to other integration approaches. e) Schematic of next-generation sequencing method to assay genome-wide off-target integration sites by PASTE and HITI. f) Alignment of reads at the on-target ACTB site using our unbiased genome-wide integration assay, showing expected on-target PASTE integration outcomes. g, h) Manhattan plot of averaged integration events for multiple single-cell clones with PASTE or HITI editing.

Now one issue with using integrase-motifs for gene insertions is the limitation on multiplexing; meaning genetic editing of several different positions throughout the genome with different cargos. While the genomic site of integration is chosen by the gRNA complementarity, the requirement to introduce AttB-AttP sites for integrase-mediated integration limits how many inserts can be chosen. The authors decided to investigate whether tinkering with the nucleotides constituting recombination AttB-AttP pairs would allow establishing some orthogonal channels; as has previously been shown by others.

To their surprise, they found several dinucleotide combinations which preserved the editing efficiency comparable to wild-type AttB-AttP. In Figure 4, they were able to integrate three fluorescent proteins simultaneously at three distinct genomic loci. This single promising experiment alone will probably water the mouths of some immunofluorescence microscopists and cell biologists reading this. Fluorescent tagging of proteins through large genomic insertions is certainly something for all biologists to be excited about.

Figure 4. Multiplexed and orthogonal gene insertion with PASTE. c) Schematic of multiplexed integration of different cargo sets at specific genomic loci. Three fluorescent cargos (GFP, mCherry, and YFP) are inserted orthogonally at three different loci (ACTB, LMNB1, NOLC1) for in-frame gene tagging. d) Orthogonality of top 4 AttB/AttP dinucleotide pairs evaluated for GFP integration with PASTE at the ACTB locus. e) Efficiency of multiplexed PASTE insertion of combinations of fluorophores at ACTB, LMNB1, and NOLC1 loci.

Furthermore, the authors could also show that their PASTE system is compatible with current viral gene delivery methods like adeno-associated viruses (AAVs) and can also be applied in non-dividing cell types similar to prime-editing.

Lastly, the authors decided to probe the vastness of nature to find integrases potentially even better suited for genome editing by browsing metagenome databases. This also has been an important tactic in the CRISPR genome editing field; since so many different organisms exist which need to defend against viruses, the diversity of CRISPR systems, and their unique proteins and catalytic mechanisms, is enormous. Nature iterated over similar functional principles so many times that for almost any specific application we seek, a protein with approximately desired biological function exists.

Exploring over 10 TB worth of data from NCBI, JGI, and other sources, we found 27,399 novel integrases (Fig. 6b-c) and annotated their associated attachment sites using a novel repeat finding algorithm that could predict potential 50 bp attachment sites with high confidence near phage boundaries. Analysis of the integrases sequences revealed that they fell into four distinct clusters: INTa, INTb, INTc, and INTd with diverse domain architectures. -Ioannidi E. et al., 2021, bioarxiv

Figure 6. Discovery of novel phage-derived integrases for programmable gene integration with PASTE. a) Schematic of integrase discovery pipeline from bacterial and metagenomic sequences. b) Phylogenetic tree of discovered integrases showing distinct subfamilies. c) Domain architecture of the four integrase subfamilies. d) Screening novel integrase integration activity using reporters in HEK293FT cells compared to BxbINT and phiC31. e) PASTE integration activity with most active integrases compared to BxbINT. f) Characterization of integrase integration activity with truncated attachment sites using reporters in HEK293FT cells.

Indeed, they discovered an integrase from B. cereus, which was more apt at using very short AttB recombination sites; improving overall the gene-editing performance of their PASTE system yet again. I fully expect other genome engineers to follow their lead.

Whether this is the limit of the PASTE system and where to go from here are just some of the questions I posed directly to the authors. You can find the full Q&A below.

Concluding remarks

Programmable insertion of large genetic cargos is a foundational tool for genome editing and genetics. Most of the genetic breakthroughs in therapeutic applications of the last three decades were at least in part based on the Cre/LoxP recombinase system in mice. The CRISPR-PASTE system will make the formerly difficult procedure of creating large genome insertions easy, fast and ubiquitous.

Similar to how the first generation of CRISPR tools expanded the possibilities for basic biology research, this second generation of highly engineered and optimized biological nanomachines will expand the possibilities of cell and gene therapy for patients.

More importantly, this work has shown yet again that by combining available protein functions, many of which are potentially still hidden somewhere in the metagenomic vastness of nature, we can create elaborate nanomachines performing biological tasks at a scale and depths hitherto thought impossible.

I don’t know what the future will bring, but the age of widespread genome editing has certainly arrived. Are we ready?


I managed to get a hold of Eleonora Ioannidi, one of the first authors on the pre-print, by reaching out to some shared colleagues. She was very generous with her time and helped me put the results of the research into context.

You can find our interview below (lightly edited for clarity):

A) How much further can the PASTE system be optimized in your opinion?

Eleonora: The AbuGoot lab has managed to achieve an optimized and efficient version of PASTE by engineering each part of the system. However, I believe that there is still room for further optimization as other labs have shown. For example, using two pegRNAs may increase the efficiency, and more genetic modifications, such as large deletions, may be achieved (Prime-del, Nature 2021). In addition, someone could also use directed evolution to improve further recombinases used in PASTE. I believe there are still other ways to improve the system in the future.

B) Some of the results in the pre-print, like 25%-50% integration, seem very high to me. Is this a realistic efficacy when used for example in multiplex approaches, pooled screens, and ‘difficult’ to transfect/transduce cell lines?

Eleonora: Each cell line shows different transduction and transfection efficiencies, as it is known. When PASTE was used, we had to adapt to each cell line’s needs. We used the appropriate protocol, for example, electroporation or lipofection, and we defined the appropriate parameters (concentrations, constructs’ ratios…etc) to achieve satisfying efficiencies.

C) What are the major limitation of this technology?

Eleonora: Based on the literature, it is shown that prime editing is not efficient at any genomic site. Many parameters need to be determined such as the PBS and RT length, but also the type of modification. All these factors can affect editing efficiency. In PASTE, we rely on inserting the AttB site first in order to achieve the second step of integration. So, if prime editing is not efficient at a specific genomic site, then the recombination step won’t be feasible.

D) Since you browsed metagenome data for better Integrases, what is your feeling about the ‘natural nanotechnology’ that might still be out there for us to discover?

Eleonora: I believe that there are many proteins yet to be discovered and Bioinformatics play already has a big role in revealing them. There will be more and more proteins [and] complexes in the future that will contribute to genome engineering and I believe that many labs are already working on this.

E) How would you personally rate the PASTE system in comparison to other genome editing approaches?

Eleonora: In the paper, we compared PASTE with known genome editing approaches such as HITI, and PASTE showed more promising results in most cases. Depending on the modification you want to achieve though, someone should choose the genomic tool accordingly. For example, PASTE is suitable for large insertions, but it is not necessary to use it for substitutions. I believe there is no universal solution yet for all genetic modifications since every method has its own limitations and advantages. In the future, genome editing will resemble a toolbox, where for every problem, there will be a suitable genetic tool to be used.

F) How did you feel while working on this project?

Eleonora: I found PASTE super-interesting from the beginning, and I knew that it can offer many new applications and possibilities, not only in the CRISPR field but in general. I was very busy working in the lab but I had in mind that it is an important and useful tool. Omar Ab. and Jonathan G. had the vision of PASTE from the beginning, and I was lucky to be part of it. At the moment, I am happy and proud that PASTE is working, and it keeps developing in the AbuGoot lab. I am looking forward to seeing PASTE being used by other scientists.

G) What are you personally excited about for this technology?

Eleonora: In general, I am enthusiastic about the application of PASTE in gene therapy since conducting large insertions can contribute to many treatments of diseases. However, I am also excited about its potentials not only in genome engineering but in other biological fields too. For example, gene tagging can be useful for scientists to study further cellular mechanisms. Multiplexing opens other horizons too since you can use PASTE in a single cell and integrate specific cargos at two totally different loci with high accuracy and efficiency. I think that the applications are numerous, and it depends on the needs of the researchers in their fields.

H) What is your opinion on somatic gene editing in humans and/or the direction we are heading with somatic or even germline gene editing?

Eleonora: Somatic gene editing in humans for disease treatment is one of the main goals of genome editing research. However, it needs to be regulated in order to be safely and correctly used in the future.

I believe that gene therapy is necessary to overcome specific genetic diseases, which cannot be confronted in other ways.

However, not everyone will consider it to be desirable since this technology is new and it will take some time to be approved.

Regarding germline gene editing, it seems to be far in the future, even though the scandal with the CRISPR babies shows us that it’s not an unrealistic scenario. Genome editing is a powerful tool, and it should be used only by experts based on law regulations and with high responsibility.

So, also in the future, it must be used only for medical purposes.

Thank you very much for your time!

This story is part of advances in biological sciences, a science communication platform that aims to explain ground-breaking science in the field of biology, medicine, biotechnology, neuroscience and genetics to literally everyone. Scientific understanding has too many barriers, let’s break them down!




AdBioS is a science communication platform that aims to explain ground-breaking science in the field of biology, medicine, biotechnology, neuroscience and genetics to literally everyone. Scientific understanding has too many barriers, let's break them down!

Recommended from Medium

Makani: Turning Wind-Turbines into Kites?!

Muon Puts Standard Model on Test

Spooky noises in the night

2022–013 — Magnetic Janus Particles

Innovation-Diffusion Theory Of History

Becoming a Scientist Against All Odds: Rita Levi-Montalcini


Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Philipp Markolin

Philipp Markolin

Science holds the keys to a world full of beauty and possibilities. I usually try something new.

More from Medium

Why is it always ones and zeros?

The Dangerous Deception of Dogma