How to: add a light sensitive gene into a mouse genome using Benchling

Ada Choudhry
11 min readApr 13, 2023

--

Optogenetics has far-reaching implications for neuroscience, but the first step in any experiment starts with inserting a light-sensitive gene. And in this article, we’ll learn to just do that. At home:)

If you would like to refresh some concepts or learn the fundamentals of CRISPR, you can read through this article.
We’re going to conduct a knock-in experiment to insert a light sensitive gene (Channelrhodopsin-2) from the algae Volvox into a mouse genome at the GtROSA26 gene. The article explains the reason behind choosing the steps of the experiments. If you would just like a walkthrough of the steps through video, click here.

As this experiment is done in silico, we have looked at how to use cloud-based tools to build the sequence of sgRNA and donor template.

This project is divided into two parts:

  1. Making a sgRNA
  2. Making the donor template

Making the sgRNA

sgRNA, or single-guide RNA, is a commonly used tool in genome editing and has many benefits. Here are some of the key benefits of using sgRNA:

  1. Specificity: sgRNA is designed to target a specific DNA sequence, which makes it highly specific for genome editing.
  2. Efficiency: sgRNA is a small molecule that can be easily delivered to cells and has a high efficiency rate in targeting specific DNA sequences.
  3. Customizability: sgRNA can be easily designed and synthesized to target different genes or regions of the genome, making it a versatile tool for genome editing.
  4. Multiplexing: Multiple sgRNAs can be used simultaneously to target different genes or regions of the genome, allowing for more efficient and targeted genome editing.
  5. Low cost: The production of sgRNA is relatively low-cost, making it accessible to researchers with limited resources.

For these reasons, I chose sgRNA over creating crRNA and tracrRNA separately.

Step 1: Decide where you want to insert the gene

You have to be mindful of where you carry out the knock-in experiment, as there are a number of risks involved with it.

  1. Off-target effects: While the goal of a knock-in experiment is to insert the new gene or DNA sequence at a specific location in the genome, there is always a risk of off-target effects. This can occur when the DNA sequence being inserted accidentally integrates into a different location in the genome, potentially causing unintended changes in gene expression or cellular function.
  2. Gene disruption: In some cases, a knock-in experiment may inadvertently disrupt the function of an endogenous gene located near the insertion site. This can lead to unintended consequences and may affect the overall health and viability of the organism.
  3. Unintended consequences: Even when a knock-in experiment is successful in inserting the desired gene or DNA sequence at the intended location, there is always a risk of unintended consequences. This can include changes in gene expression, cellular function, or overall health and viability of the organism.

For these reasons, gene insertions are usually done at safe harbor sites.

Safe Harbor sites are genomic locations that are considered safe for targeted gene insertion as they are not associated with any known harmful effects on the organism. These sites are typically characterized by their stability, low frequency of genetic recombination, and minimal impact on endogenous gene expression and cellular function.

ROSA26 is a safe harbor site located at chromosome 6 in mice and is commonly used as a site for targeted gene insertion. The ROSA26 locus is highly conserved across different species, including humans, making it a useful tool for studying gene function and regulation. It also has a high level of expression in most tissues, allowing for targeted gene insertion to be widely expressed throughout the organism.

Source: Icahn School of Medicine at Mount Sinai

In addition, the ROSA26 locus has been shown to exhibit strong and consistent expression over time, making it a reliable site for long-term studies of gene function and regulation.

ROSA26 is not a coding gene but rather a non-coding gene. The ROSA26 locus is not known to produce any RNA molecules on its own. It contains regulatory sequences that ensure stable and consistent expression of transgenes inserted at this site.

Regulatory sequences are specific DNA sequences within a gene or genomic locus that control the expression of that gene or locus. These sequences can be found in different regions of a gene or locus, such as the promoter region, enhancer region, or silencer region, and they interact with specific proteins, called transcription factors, to regulate the transcription of the gene.

The ROSA26 locus contains a strong promoter, the CAG promoter, that drives high levels of gene expression in many different cell types. In addition, the locus contains a polyadenylation signal that ensures proper mRNA processing and stability. mRNA processing is a complex process that occurs after transcription and involves a series of modifications to the newly synthesized mRNA molecule. The processing of mRNA is critical for the production of functional proteins, as it enables the mRNA to be properly exported from the nucleus, translated by ribosomes, and ultimately degraded when no longer needed.

These regulatory sequences are critical for ensuring the correct expression of transgenes that are inserted at the ROSA26 locus.

In humans and other mammals, the AAVS1 (Adeno-associated virus integration site 1) locus is another commonly used safe harbor site and is located on chromosome 19.

Step 2: Find the sequence of the gene locus

We will be using NCBI and Ensembl to find the correct sequence.

  1. Go to NCBI and click on Nucleotide from the drop-down menu. Put in the scientific name of the locus ‘GtROSA26’ and the scientific name for house mouse ‘mus musculus’. Select the top-most option.

This is the window that opens once you select the locus. This window gives the details of the gene. It classifies it as ncRNA, meaning it transcribes non-coding RNA.

This window also shows that the location of the gene and its neighbors

2. Click on GenBank for the gene. GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.

3. Customize your settings: Select "Selected region" from the right box to only view the region related to the gene.

GenBank Window

4. After this research paper, I learned that transgenes are usually added to the XbaI site within the first intron. The zygote code from the paper matched my organism’s genome ‘C57BL/6’. So, we need to find the code for the first intron.

5. Going on the genome database Ensembl, you can find the transcripts of this gene. A single gene can produce multiple transcripts (mRNA for gene regulation), and these can be very different from one another.

On clicking exons in the left tab, you can find the intron sequence as well.

Step 3: Designing sgRNA on Benchling

1. Now that we know the intron sequence (from 113,052,573 to 113,048,397), I updated my view on NCBI and copied the raw bases to upload on Benchling. Now, we can design our sgRNA on Benchling. Once you’re logged in, in a project folder, select + → CRISPR → CRISPR Guides. Since the platform doesn’t have the sequence of GtROSA26, I uploaded the copied raw bases from NCBI.

Pay attention to which genome you’re selecting and match it with the name on the NCBI website. Here I have chosen GRCMm39. The PAM for this experiment is NGG, as we’re using Cas9.

2. Once you upload the sequence to Benchling, it shows on the linear map that it is the sequence of the Intron 1 BbaI. On expanding the linear map, you will be able to find the sequence for XbaI site.

3. A tab DESIGN CRISPR should be displayed on the right beside the gene. If not, you can select CRISPR from the rightmost strip and choose "Design and Analyze Guides." As an intron doesn’t code for proteins, there aren’t restrictions for deleting a sequence.

To make sure there are no critical functional domains upstream of the intron, you can copy and upload the sequence to MOTIF

I have selected an upstream to the XbaI region to generate possible sequences of sgRNA and then press the + button to generate possible sgRNA

This shows many possible sgRNA sequences from the selected section.

4. But the question comes, ‘How to select an efficient sgRNA?’

The answer comes down to two numbers, On-target score and Off-target score.

Off target score: Inverse probability that tells you how likely is it to not attach to the wrong genes. Higher scores are good.
On target score: Cleavage efficiency. Higher scores for gRNA are good.

The orange selected sgRNA sequence is the one I have chosen, with the highest on-target and off-target score. Sometimes, one gRNA works better than another. Given this, it is recommended to design and test more than two protospacer sequences to increase the success rate of creating gene knockouts.

Since Cas9 will cut both DNA strands after binding to the target sequence, your protospacer sequence can be on either the plus or minus strand.

If you would like to build vectors for your sgRNA, you can learn how to do that through this article.

Running a BLAST sequence

The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between protein or nucleotide sequences. The program compares nucleotide or protein sequences to sequence in a database and calculates the statistical significance of the matches.

It is a bioinformatics tool commonly used for sequence analysis in molecular biology. One of the ways BLAST can be used in designing sgRNA is by identifying potential off-target sites for the sgRNA.

When designing a sgRNA for genome editing, it is important to ensure that the sgRNA only targets the intended genomic region and not other regions of the genome. BLAST can be used to search for homologous sequences in the genome that match the sgRNA sequence, and thus could potentially be targeted by the sgRNA. By comparing the sgRNA sequence to the entire genome or a database of sequences, BLAST can identify potential off-target sites for the sgRNA.

Once the potential off-target sites have been identified, they can be further analyzed and filtered based on various criteria such as their location in the genome, the number of mismatches between the sgRNA and the off-target site, and the potential impact of the off-target site on gene function. This information can then be used to refine the sgRNA design and improve its specificity.

You can access BLAST through the front page of NCBI.

After selecting nucleotide BLAST, this window should open.

The query sequence(s) to be used for a BLAST search should be pasted in the ‘Search’ text area. BLAST accepts a number of different types of input and automatically determines the format or the input. The accepted sequence are FASTA and bare sequences (without the single-line description found in FASTA).

You can simply copy-paste the sequence from the sgRNA sequence from Benchling.

Adding the job title as the sgRNA number is helpful to remember which sgRNA you did the search for.

In the database, choose Genomic + transcript database and then mouse in the organism drop-down menu.

In the program selection section, I have chosen highly similar sequences (megablast) to increase the efficiency of my sgRNA.

In the results, we need to optimize for a lower E value.

In the predicted transcript with the lowest E value, it shows a different transcript than the one we selected.

However, if I choose a sgRNA from the original sequence of GtROSA26 imported from NCBI (in our approach, we used an intron sequence), it shows an accurate result in both the transcripts and genomic results.

The E value is lowest for the correct genome and transcript and it is quite lower than the second match.

Results on alignment with the sgRNA from the genomic sequence

This shows that there are various approaches we can take by targeting various sections of the locus. The sgRNA designed for the intron has good chances of targeting other transcripts, but it identifies the correct chromosome 6. There is a lot of trial and error involved and I will look further into why the experiment did not give good results for the XbaI site.

If this sgRNA is not efficient, we can always choose the sgRNA from the entire sequence of GtROSA26 to minimize off-target effects.

Making the donor template

Here comes the second part of our experiment. Now that the sgRNA has made the cut on both the strands, we need to provide a donor template to initiate homologous recombination, which would insert a foreign gene. If you would like understand the mechanism of recombination in depth, you can read this article.

I used the Horizon website to design my knock-in template. I’ve chosen to go with ssDNA oligo because we are inserting a small section of DNA.

Insert GtROSA26 in the gene target for the transcript.

On clicking ‘Display target region’, this is the sequence that shows. Use the orange bar to slide and then press on the lock to open up the bar to select the sequence you need to replace.

Now, we’ll insert the coding sequence from the Channelrhodopsin-2 gene which is light sensitive. To find the NCBI sequence of the gene, click here.

CDS of ChR2

Select a portion of CDS to insert in the XBaI site in GtROSA26 locus.

On clicking ‘Generate DNA donor’, it produces the following sequence.

As the whole sequence is in green, we’re good to go. If you want a physical copy of the oligo, you can add the sequence to cart and they can ship it to your lab!

And that’s it for our experiment! I hope you had fun and learned new things from the procedure.

Please let me know in the comments about how we can make the procedure more efficient.

Until then, keep reading, keep learning!

Sources:

--

--