Reaching its limit? AlphaFold2, how it works and the challenges it faces!

Carlpjreilly
3 min readApr 19, 2023

--

What is AlphaFold2?

AlphaFold2 is a deep learning system that can predict the three-dimensional (3D) structure of proteins from their amino acid sequence with high accuracy [1]. It was developed by researchers at the UK-based artificial intelligence (AI) lab, DeepMind, which is owned by Alphabet Inc. (the parent company of Google). It was released in July 2021 as the second generation of DeepMind’s protein-prediction system, to AlphaFold in 2018.

Why is AlphaFold2 important?

The prediction of protein structures is a fundamental problem in biochemistry and drug discovery and is critical to understanding the function of proteins and their interactions with other molecules. While experimental techniques such as X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy have been used to determine protein structures, these methods can be time-consuming, expensive, and difficult to apply to all proteins.

How does it work?

AlphaFold2 is a deep learning system that uses a neural network to predict the 3D structure of a protein based on its amino acid sequence. The system is trained on a large database of known protein structures and sequences, and uses this data to generate its predictions [2].

Step 1 Input: AlphaFold2 takes a protein sequence as input, which is a string of amino acids represented by their one-letter codes.

Step 2 Feature Extraction: AlphaFold2 uses a series of neural networks to extract features from the protein sequence, including information about the distances between pairs of amino acids and the angles between bonds in the protein backbone.

Step 3 Folding Simulation: Based on these features, AlphaFold2 then performs a simulation of the folding process, predicting the most likely 3D structure of the protein. This involves predicting the distances and angles between all pairs of amino acids in the protein, and using these predictions to generate a 3D structure that minimizes the energy of the protein.

DeepMind is known to have trained the program on over 170,000 proteins [3]. One of the key innovations of AlphaFold2 is its use of an attention mechanism, which allows the system to focus on the most relevant parts of the protein sequence and structure when making predictions. This attention mechanism allows the system to better capture the long-range interactions between different parts of the protein, which are critical to its folding and function.

Step 4 Refinement: AlphaFold2 then uses a second neural network to refine its initial prediction, adjusting the angles and distances between amino acids to further optimize the structure.

Step 5 Output: The final output of AlphaFold2 is a 3D structure of the protein, represented as a set of coordinates for each amino acid in the chain.

What improvements has AlphaFold2 over AlphaFold?

AlphaFold2 is significantly more accurate than the original AlphaFold. In the 2018 CASP13 protein folding prediction challenge, AlphaFold achieved an accuracy of 25–30% in predicting protein structures, while AlphaFold2 achieved an accuracy of around 90% in CASP14 challenge.

AlphaFold2 is also much faster than the original AlphaFold. The 2018 version took several days to make a prediction, while AlphaFold2 can make predictions in just a few hours.

AlphaFold2 uses a deep neural network architecture that was specifically designed for protein folding prediction, while the original AlphaFold used a combination of machine learning techniques and a physical model of protein folding.

What’s next for AlphaFold2? What challenges does it face?

Data availability: AlphaFold2 requires a large amount of high-quality protein sequence and structural data to make accurate predictions. However, such data is not available for all proteins, particularly for proteins from non-model organisms or those with low sequence identity to known proteins.

Quality of input data: The accuracy of AlphaFold2 predictions can be affected by the quality of the input data, such as errors or gaps in the protein sequence, or low-resolution experimental data.

Structural complexity: Some proteins have complex structures or are part of multi-protein complexes, which can be difficult for AlphaFold2 to accurately predict.

Post-translational modifications: AlphaFold2 does not currently incorporate post-translational modifications into its predictions, which can affect the folding and function of proteins.

Conclusion

AlphaFold2 has already made significant advances in the field of biology and Machine Learning, fast-tracking breakthroughs at unpredicted levels but there is still work to be done to reach its potential. The world waits in anticipation of DeepMind’s next version of this software, and no doubt countless others have been inspired to try and solve this multi-dimensional puzzle.

References:

1. DeepMind, https://www.deepmind.com

2. Will Douglas Heaven. “DeepMind’s protein-folding AI has solved a 50-year-old grand challenge of biology”. MIT Technology Review. (2020)

3. Robert F. Service. (2020). “The game has changed”, AI triumphs at solving protein structures. Science.

--

--