Proteins Network Too!

K
7 min readAug 3, 2021

--

I’m a little obsessed with genetics.

Only because I’m gene-uinely amazed by the possibilities!

Please accept my apology for that pun.

The truth is, genomics is an extremely broad field. There are so many applications, such as gene-therapy and gene-editing where you can literally alter your genetic code or it’s expression.

But here’s the thing…

Before we get down into our genetic code with our chain-saw and sandpaper, we have to know where to cut. We have to figure out what each gene does and which genes need to be targeted.

Protein Networks

Genes are basically protein ‘dispensers.’

Genes (sequences of DNA) transcribe into RNA and then goes through translation to produce an amino acid. Amino acids then combine to form a protein!

If you’ve read my other articles you can probably tell that I love this graphic.

So if a protein is misbehaving, we could solve the problem by cutting it off at the source: the gene!

However, protein functions are usually not that simple. Each protein is involved in many interactions with other proteins. So to better understand proteins function and therefore our genome, we have to map out these protein networks.

Y2H System.

Currently, you need to study individual protein protein interactions (PPIs) to piece a network together. One of the systems we use to study PPIs it the Yeast 2-Hybrid (Y2H) system.

You may know about yeast from some unfortunate experiences but essentially, yeast is a fungus. This is important because fungi cells contain these circular DNA molecules called plasmids.

Y2H has two plasmids, the bait plasmid and the prey/fish plasmid.

Here’s how it works.

  1. You genetically encode, or ‘install,’ Protein A’s DNA into the bait plasmid. This makes the plasmid express the bait and an ‘activation’ piece.
  2. You genetically encode, or ‘install,’ the prey’s DNA into the prey plasmid. The prey is not a specific protein but instead whatever protein ‘takes the bait’ AKA, interacts with Protein A. The prey plasmid will express the prey and a ‘binding’ piece.
  3. The ‘activation’ piece and the ‘binding’ piece only interact if the prey and the bait interact. And if they do, they’ll activate a reporter gene which will then transcribe the bait gene.

What’s important to remember is that the prey is a variable. The bait is Protein A, that is non-negotiable. But the prey is any other protein that interacts with the bait. It’s just filling an open position.

At the end of the process, you check if the bait gene is transcribed or not and voila, you know if you’ve got a PPI. Easy!

“What’s the catch?”

The Catch.

Although the Y2H system is an excellent way to determine if these proteins have directly interacted, you need to do a separate experiment for each and every variation of proteins.

It’s tedious, but can it be that bad?

Obviously.

Let’s do some math.

It’s estimated that the human body has anywhere between 80,000 - 400,000 proteins. Now if we want to test whether or whether not they interact with each other, you’ll need to have two sets of all proteins and to test each one of the first set with all of the second set. Like this…

Meaning, you’re looking at a minimum of 80,000 x 79,999 experiments.

That’s too many experiments.

We desperately need an efficient method to identify protein-protein interactions (PPIs).

PROPER-seq

Enter PROPER-seq, a tool recently developed by a team of Bioengineers at UC San Diego led by Kara L. Johnson at Prof. Sheng Zhong’s lab. PROPER-seq can map PPIs en masse.

There are essentially three steps to this process. SMART-display, INLISE and PROPER-seqTools.

  1. First, in SMART-display, the proteins are labelled with RNA ‘barcodes.’ It can almost act as a fingerprint, unique to it’s owner and therefore can always be tracked back to them. so that it can so that they can be identified.
  2. Then in INLISE, the proteins are allowed to ‘run loose.’ When two proteins interact, the RNA fuses together to look like this: cDNA1 — linker — cDNA2 (cDNA is just a DNA copy of RNA).
  3. In PROPER-seqTools, all these pairs are sequenced to identify which proteins have interacted.

That’s the bare bones structure but let’s take a closer look…

1. SMART-display

Before we look at SMART-display, we have to break down mRNA display.

mRNA display is when you create molecules that bind to a specific target. This can be extremely useful when you want to track the target.

mRNA display is great but not perfect…

Real footage of mRNA display realizing it is not perfect.

SMART-display had to simplify the mRNA display process so that it could be used at the genome scale and this was done by replacing the most time-consuming step: creating a gene library for transcription and translation.

This requires DNA sequences to initiate the transcription and translation as well as a linking agents called puromycin attachments. The puromycin attachments could be alternatively provided with a separate bacterial strand

Instead of spending so much time creating a gene library to eventually transcribe and translate into protein, the researchers had a better idea.

To use ‘ready-made’ plasmids.

There you go! The protein-coding genes were encoded into the plasmids and the mRNA display had been simplified enough to allow it to operate at the genome scale.

Celebration aside, now let’s look at how this process actually works.

The goal of SMART-display is to create and fuse an RNA barcode to it’s subsequent protein. The first step is literally to just, extract the genetic information of the input cells.

What to do next? Think about it. We want to link the mRNA to the protein. What’s something that we introduced to do with linking?

If you said puromycin you are correct!

Here’s where a lot of things start to happen. I’ll try to keep it short and simple. You fuse the puromycin near one end of the mRNA. Then the mRNA goes through translation to create a protein.

Now you have mRNA, puromycin, and protein. The protein covalently bonds to the mRNA, the puromycin joins the chain and suddenly you have a fusion which looks like mRNA-linker-protein!

Like this…

Now that the proteins are labelled, we’re ready to mix and match!

2. INLISE

This stage has three steps (Incubation, Ligation, and Sequencing). The goal is to ‘convert’ PPIs into chimeric (two sets of) DNA sequences that look like: cDNA1 - linker - cDNA2.

Now, similar to the process with Y2H, this requires two libraries, the bait and the prey. They are both the set of RNA-barcoded proteins with one difference. The bait library is immobilized whereas the prey library is free to move.

Before the libraries are combined, they are fused with puromycin, allowing the mRNA to link if necessary. On top of that, to stabilize, the mRNA goes through reverse transcription into cDNA.

Then the prey library is mixed into the bait library. Meaning, each of the prey proteins are able to free interact (or not interact) with any of the bait proteins.

The black aspect represents puromycin, the linker.

Above is an example of a PPI, where the cDNA interacts and thanks to the puromycin, fuses to itself!

So finally, we are left with what we wanted: cDNA1 - linker - cDNA2.

3. PROPER-seqTools Read Pairs

The egg is boiled, now it’s just time to remove the shell. But sometimes, it can be a little trickier than we expect.

For the vegans out there, or people who prefer their eggs poached, I mean that we have to figure out what proteins this cDNA represents.

You have two different situations. The first is a protein with a cDNA tag. The second is a chimeric DNA strand.

Using a sophisticated software you can determine whether a PPI has occurred. For instance, the system can scan for a read pair where the ends map to a different protein. That means instead of simply mRNA of one protein, you have cDNA1 - linker - cDNA2!

Through many efficient programs, the team was also able to identify effective ways to work backwards from the mRNA barcodes to identify what the proteins were!

Yay!

Experiment.

Ready to see this tool in action!

The Sheng Zhong lab took human embryonic kidney cells, T lymphocytes (white blood cells) and endothelial cells (cells from the inside lining of heart and blood vessels).

Then they applied all three steps of PROPER-seq and were amazed by the results.

They found 8635 proteins with 210,518 PPIs.

Without this technology, this would have been an ambitious, laborious task. That’s why PROPER-seq is the technology we need to further our understanding of genomics.

So I think it’s safe to conclude that this technology is gene-ius!

Hey! Thanks for reading! If you’re interested in the original paper check it out here. For more on me check out my scientific videos on my YouTube Channel and give me a follow on Instagram.

--

--

K

UC San Diego Biotech Engineering | Reproductive Longevity Enthusiast