Drug discovery the LabGenius way: a protein scientist gives us the tour

Gabriella Laker
LabGenius
Published in
9 min readMay 26, 2020
Gabriella Laker is a protein scientist at LabGenius.

Imagine you’re competing in the final of the 100m hurdles. It’s your moment to show the world what you’re made of. But, if you don’t clear every one of them, you’re out of the race, no matter how many years of blood, sweat and tears it took to get you there.

The same is true for drug discovery.

Let’s say that your molecule is one of the lucky few that makes it through to clinical trials. You now have a 1 in 10 chance of getting it to market. But first, you need to clear a lot of hurdles to get it there. If your molecule is toxic, immunogenic, if it aggregates, it isn’t efficacious, it’s unstable, it can’t be produced at scale, you’ve guessed it, it’s binned. Even if it took years of research and millions of dollars to get to this point.

Harry Rickerby, Platform Architect at LabGenius, delves into the challenges of drug discovery, and why the Pharma industry needs to embrace technology in order to remain viable here.

The LabGenius mission

Here at LabGenius, our mission is to build a future where machine learning discovers life-changing medicines for people worldwide.

Conventional methods of drug discovery

The key to engineering a new protein is to find the right combination of amino acids. To do this, we need to search through all of the potential protein sequences that have, do and could possibly exist. This multi-dimensional ‘sequence space’ is infinite, and so the search isn’t easy.

Conventional methods of protein engineering mimic Darwinian evolution to generate genetic variation and select for the fittest proteins. This approach is convenient, but it is inherently inefficient, as Harry explains here.

The LabGenius approach

At LabGenius, we are building a protein engineering platform, driven by machine learning, capable of simultaneous co-optimisation of multiple biochemical and biophysical properties. By embracing the power of technology, we believe that our platform will be able to tackle challenging therapeutic problems, where conventional protein engineering methods have so far failed.

An exciting prospect, but what does that really look like on the laboratory front line, from the eyes of a protein scientist?

As with conventional protein engineering methods, the biggest challenge remains identifying the best protein sequence from the infinite sequence space.

First things first, we need to build a map of sequence space. To do this, we need to generate genetic variation, on an enormous scale.

Building genetic variation

Each map we build represents a region of sequence space that spans our chosen therapeutic area. For now, let’s stick to the fundamentals. Every drug needs to bind to its target with high affinity, and elicit a potent therapeutic effect.

So how do you ensure that you are in the right area of sequence space to begin with? By identifying a sequence that encodes a known binder. This is our starting scaffold.

Within that scaffold sequence, we can identify the amino acids that confer affinity and selectivity. We now want to generate variations of this sequence in order to find proteins that outperform our starting scaffold. Let’s say 15 residues are involved in binding. One of twenty amino acids can occupy each position. Theoretically, we can randomise this region to create a library of 3.3 * 10^23 unique variants.

This diversity is only theoretical. Using our in-house Next Generation Sequencing technologies, we can validate the true diversity in just a few days.

Applying selection pressures allow us to filter through the entire library for those variants that fulfil our biochemical and biophysical requirements. It is these sequences that are used to build our map.

Machine learning models

Within the brain, approximately 86 billion neurons transmit information through 10^15 synapses to form our neural networks. We develop and train these networks as we learn how to walk, talk, and construct thoughts and ideas.

This way of processing biological information, and more importantly how we learn from our experiences, inspired the computational approach that we use to uncover the complex relationship between protein sequence and function.

By screening high quality, unbiased and genetically diverse datasets in the lab, we can use machine learning to unpick the genetic design rules that underpin life. We can begin to understand the sequence-to-function relationship of those sequences represented in our genetic libraries, rank the unseen sequences, and predict improved sequences.

It isn’t possible to characterise every single sequence in the lab. Instead, we identify the top 96 with the most potential for fulfilling our therapeutic needs. The next challenge is proving that these sequences truly are improvements on our starting molecule. To do this, we use good old fashioned molecular biology, combined with cutting edge robotics.

Automation

Humans are incredible machines capable of wisdom, creativity and understanding. But we get tired, and we make mistakes.

Let’s take a single sequence identified by our machine learning model. To characterise this protein, we first need to express it.

Within the lab, we use Golden Gate assembly, a molecular cloning method that allows insertion of our sequence of interest into our expression vector, in a single step.

You’ve had a busy morning and now need to set up the assembly reaction before your two o’clock meeting. In a rush, you incorrectly set the pipette volume, and now have twice as much DNA ligase. Maybe you forgot to add it and don’t have any ligase at all. And to top it off, your meeting overruns and your 60 minute incubation has now become 80. Not ideal.

Now imagine having to set up the same reaction for 96 different sequences. The six pipette steps have become 576. The single Eppendorf tube has become a 96-well plate. You were up to well D7 but you’re now doubting whether D6 contains enzyme. Better start again to be sure. The whole process is incredibly time consuming and highly error prone.

Thankfully, robots do a much better job.

At LabGenius, our Automation Engineers utilise the power of liquid handling robots to automate our entire high throughput protein production workflow. This is a big deal. As scientists, we are no longer just inefficient pipetting machines. We instead have the freedom to put our minds to much better use — to design the right type of experiment, rather than just executing it.

Protein production

We have our cloning output — 96 plasmids. Each plasmid contains a unique sequence inserted into our expression vector. NGS confirms that these sequences are correct. We’re now ready to go.

After growing our plasmids in liquid cultures, we can extract the DNA. Specialised bacterial cells will take up the extracted DNA and express the protein that it encodes.

Our protein now exists in the liquid matrix inside the cell. Disrupting the cell membrane will release all of the soluble proteins expressed by the cell. We now need to isolate just our protein of interest. Conveniently, his-tags are designed for exactly that. Their strong affinity for transition metal ions means that his-tagged proteins can be isolated from a soup of un-tagged proteins using immobilised ions.

A nickel column plugged up to an FPLC purification system is a great way to do this. But what do you do if you want to purify 96 proteins in parallel, and still get to the gym after work? You’ve guessed it — automation!

At LabGenius, we have applied the basic principles of affinity chromatography to a high throughout, 96-well plate format. And it’s surprisingly simple.

Incubating the protein soup with nickel resin facilitates high affinity binding of our his-tagged protein. The protein-resin mixture is transferred to a 96-well filter plate, washed to remove contaminants, and exchanged into a storage buffer. And finally, overnight incubation with protease facilitates clean cleavage of the his-tag, removing our target protein from the resin.

We now have 96 unique, and beautifully pure protein variants.

And here’s where the real fun begins — characterisation.

Protein characterisation

To achieve our mission of discovering life-changing medicines for people worldwide, protein sequences that have been identified by our machine learning models, and purified using our high throughput automation platform, now need to outperform the starting molecule. Otherwise, we’re all wasting our time.

Initially, the most important questions we need to answer are -

  1. Is the candidate potent?
  2. Is the candidate stable?
  3. Does the candidate have a high affinity for its target?

Conveniently, we have three assays that can tell us exactly that.

Potency

You may ask why we don’t start by determining binding affinity. Fair question. The answer is simple, although it does involve making one assumption — if our candidate molecule elicits a potent therapeutic effect against our target, it must also bind to that target. Simply put, we can kill two birds with one stone. We can quantify potency, while also confirming binding.

Within the lab, we do this using a genetic reporter assay. Reporter genes enable the detection of gene expression. They encode enzymes that convert substrates to luminescent products. Binding of our target to its receptor triggers activation of the reporter gene, which directly translates into a quantifiable luminescent signal.

Our candidate molecules are designed to inhibit this interaction. We can quantify the potency of our set of candidates in a single assay. Of course, we use the power of automation to do so. Potent candidates reduce reporter gene expression and thereby reduce the luminescent signal, enabling us to construct potency rankings for each set of candidates.

Stability

Our candidate needs to withstand the harsh environments of the body if it is to survive after administration. If our candidate has a short half-life, if it is pH sensitive, if it is protease labile, it is unlikely to deliver its intended therapeutic effect.

Using our automation platform, characterising protein stability can be determined for 96 candidates at a time. By subjecting our candidates to conditions of stress, we can determine the amount that remains intact afterwards, and rank them accordingly.

Affinity

We now have a detailed picture of which of the 96 candidates are most potent and most stable. Combining this data allows us to identify those that outperform our starting molecule. Now we determine binding affinity.

Within the lab, we use Surface Plasmon Resonance to characterise binding of our candidates to our target, in real time. The top performing candidates are passed over a surface on which the target has been immobilised. The interaction profile is recorded in real time, and provides quantitative data on binding kinetics and affinity.

Making a real difference in protein therapeutics at LabGenius

As a species, we are simply incapable of fully grasping the complexity of biological systems. By accepting this, and by embracing the ever-expanding capabilities of technology, we can begin to understand the genetic design rules that underpin life.

From an infinite sequence space, we can design and build genetically diverse libraries that capture trillions of unique protein sequences.

Using the power of machine learning, we can uncover the rules that entwine protein sequence and function. Using these rules, we can build models that accurately predict those sequences that meet our therapeutic need.

Using the power of robotics, we can produce and characterise those proteins in parallel. Proteins that have been co-optimised for multiple biochemical and biophysical properties and that have real therapeutic significance for humans worldwide.

It’s an exciting time to be a part of LabGenius. Join us!

--

--