Using AlphaFold2 to Predict the Structure of a Protein-Protein Complex

Huafeng Xu
Roivant Technology
Published in
4 min readSep 19, 2021

At Roivant Discovery we have built a computational platform based on quantum physics to simulate the dynamic behavior of biomolecules (such as proteins and protein-protein complexes), to reveal the relationship between their dynamics and their biological activities, and to rationally design small molecule drugs that modulate their dynamic behavior to achieve desired therapeutic effects.

Our simulations invariably start from a 3-dimensional protein structure, or the protein-protein complex, of interest. The standard methods for deriving such structures are X-ray crystallography and more recently cryogenic electron microscopy (cryo-EM). We have routinely used existing structures in the public repository, Protein Data Bank (PDB), and obtained our own structures by in-house crystallography. More recently, we have explored the use of structural models predicted by DeepMind’s AlphaFold2, which has demonstrated unprecedented accuracy in predicting the crystal structures of proteins, often approaching the experimental resolution of crystallography. We have been experimenting with the AlphaFold2 predictions and exploring how to integrate them into our drug design process.

Although AlphaFold2 is trained to predict the structures of individual proteins, one can jury rig AlphaFold2 to predict the structures of protein-protein complexes. We did this by concatenating two protein sequences into one and inserting a polyglycine linker of sufficient length in-between.

As an example, we used this approach to predict the structure of the barnase-barstar complex. Barnase is a ribonuclease secreted by the bacterium Bacillus amyloliquefaciens. Intracellular barnase, if active, is lethal to the cell, thus the bacterium avoids chemical suicide by expressing a potent barnase inhibitor, barstar, that tightly binds to and inhibits barnase with a blazing association rate of 10⁸ s⁻¹ M⁻¹ and sub-picomolar binding affinity (an equilibrium dissociation constant of 10⁻¹⁴ M; see, e.g., Buckle et al.). To predict the structure of their complex, we constructed the following sequence, connecting the two proteins using a 60-residue long polyglycine linker.

>sp|P00648|P11540|Barnase-Barstar
MMKMEGIALKKRLSWISVCLLVLVSAAGMLFSTAAKTETSSHKAHTEAQVINTFDGVADYLQTYHKLPDNYITKSEAQALGWVASKGNLADVAPGKSIGGDIFSNREGKLPGKSGRTWREADINYTSGFRNSDRILYSSDWLIYKTTDHYQTFTKIR
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
MKKAVINGEQIRSISDLHQTLKKELALPEYYGENLDALWDCLTGWVEYPLVLEWRQFEQSKQLTENGAESVLQVFREAKAEGCDITIILS

We took care to prevent the crystal structures from being used as templates by setting the maximum template date to 1900–01–01, long before Max Perutz solved the first crystal structure of hemoglobin. AlphaFold2 predicted five very similar models; the prediction it ranked as the best is shown in Figure 1.

Figure 1. The top-ranked model of the predicted barnase-barstar complex. The barnase is at the top and the barstar is at the bottom. The AlphaFold2 prediction (blue) is superimposed onto the crystal structure (PDB id:2ZA4, orange). The polyglycine linker (gray) is predicted to be disordered.

We assessed these predicted models by the CAPRI (Critical Assessment of PRedicted Interactions — CAPRI Docking) criteria, which include the following three parameters:

  1. Fraction of native contacts — fNat
  2. Ligand RMSD — L-RMSD (the molecular docking community uses the term ligand for both a small molecule and for a smaller protein in a protein-protein complex; in our case, barstar is the ligand)
  3. Interface RMSD — I-RMSD

These parameters need to be defined with respect to a reference crystal structure. For this assessment of the barnase-barstar complex we used the PDB structure 2ZA4.

A quality label may be assigned to the predicted structures based on the values of fNat, L-RMSD, and I-RMSD, as illustrated by the Figure 2.

Figure 2. CAPRI categories for prediction qualities.

Four of the five models by AlphaFold2 are of high quality (the green rectangle in Figure 2). Interestingly, the model ranked as the best by AlphaFold2 is the only one of medium quality (the blue region in Figure 2). The CAPRI parameters for these models are summarized in the following table.

The model ranked 3rd by AlphaFold2 is of the best quality based on the CAPRI criteria; Figure 3 shows its comparison to the crystal structure and Figure 4 shows its comparison to the model ranked 1st.

Figure 3. The model of barnase-barstar complex ranked 3rd by AlphaFold2, superimposed onto the crystal structure. Same colors are used as in Figure 1.
Figure 4. Comparison between the 1st model (blue), the 3rd model (green), and the crystal structure (orange) of barnase-barstar complex. The polyglycine linker is in gray.

Given the high degree of coevolutionary pressure between barnase and barstar across many bacteria, it is unsurprising that AlphaFold2 — which takes advantage of such coevolutionary information — was able to predict their complex with high accuracy. How well can AlphaFold2 predict other protein-protein complexes is an open and interesting question, which we will explore in future work, along with the ability to use models derived from AlphaFold2 as input structures for molecular simulations.

Taras Dauzhenka contributed to this work.

--

--

Huafeng Xu
Roivant Technology

Scientist, entrepreneur, a pragmatist who dreams of what might be possible. I believe that all spare time is wasted and I strive to waste them in joyful ways.