AlphaFold, AI for protein structures prediction

HAMZA ABDULLAH
THE 21st CENTURY
Published in
7 min readJan 17, 2019

using Artificial Intelligence for de-novo scientific discovery in Biology.

DeepMind, a Google’s company has developed an AI model for 3D protein structure prediction model called AlphaFold. The 3D models of proteins that AlphaFold generates are far more accurate than any previous model developed for such complex biological problems making this a de novo achievement for DeepMind.

DeepMind has brought together experts and scientist from multiple backgrounds i.e. Structural biology, bioinformatics, Physics and Machine learning to apply cutting-edge techniques to predict 3-D protein structures based on its genomic data.

The 3-D protein structures models predicted by AlphaFold are far more accurate than the any previous outputs making this achievement a significant progress towards solving such a most astronomically complex biological task.

AlphaFold, is a result of years of prior research using genomic data to predict protein structures which is highly difficult task in biology or Proteomics.

Protein folding

Protein folding occurs in a cellular compartment called the endoplasmic reticulum. This is a vital cellular process because proteins must be correctly folded into specific, three-dimensional shapes in order to function correctly. Unfolded or mis-folded proteins contribute to the pathology of many genetics diseases, which could pass on to next generations. i.e. Alzheimer’s, Parkinson’s, Huntington’s and cystic fibrosis.

Cells rely on a very sensitive system known as the unfolded protein response (UPR) to guard against the cellular stress caused by protein folding problems. The UPR is a cell’s way to ensure its ability to secrete proteins is working properly. Its role is to turn on genes that help the endoplasmic reticulum properly fold proteins. With these genes turned on, the cell is better equipped to handle the stress of protein folding problems that may arise. However, severe stress can overwhelm the UPR and lead to abnormal cellular function.

Proteins are complex biomolecules essential for sustaining biological life. Nearly every function our body performs — contracting muscles, sensing light, or turning food into energy — can be traced back to one or more proteins and how they move and change. Which are controlled by genes encoded in DNA.

Protein functionality highly depends on its 3-D structure. For example antibody proteins that make up our immune systems are ‘Y-shaped’, and are akin to unique hooks. By latching on to viruses and bacteria, antibody proteins are able to detect and tag disease-causing microorganisms for extermination. Similarly, collagen proteins are shaped like cords, which transmit tension between cartilage, ligaments, bones, and skin. Other types of proteins include CRISPR and Cas9, which act like scissors and cut and paste DNA; antifreeze proteins, whose 3D structure allows them to bind to ice crystals and prevent organisms from freezing; and ribosomes that act like a programmed assembly line, which help build proteins themselves.

3-D protein structure prediction from its genomic data is highly complex tasks for scientists for decades and it is considered to be an astronomically complex biological problem which is highly important in drug discovery process too. The challenge is that DNA only contains information about the sequence of a protein’s building blocks called amino acid residues, which form long chains. Predicting how those chains will fold into the intricate 3D structure of a protein is what’s known as the “protein folding problem”. The bigger the protein, the more complicated and difficult it is to model because there are more interactions between amino acids to take into account.

According to Levinthal’s paradox, it would take longer than the age of the universe to enumerate all the possible configurations of a typical protein before reaching the right 3D structure.

Importance!

Ability to predict protein structures is important to understand the functionality of proteins in any biological system as well as in the diagnosis and treatment of especially genetic diseases believed to be caused because of mis-folding of proteins. i.e. Alzheimer’s, Parkinson’s, Huntington’s and cystic fibrosis.

An understanding of protein folding will also assist in protein design, which could unlock a tremendous number of benefits. For example, advances in biodegradable enzymes — which can be enabled by protein design — could help manage pollutants like plastic and oil, helping us break down waste in ways that are more friendly to our environment. In fact, researchers have already begun engineering bacteria to secrete proteins that will make waste biodegradable, and easier to process.

To catalyze research and measure progress on the newest methods for improving the accuracy of predictions, a biennial global competition called the Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP) was established in 1994, and has become the gold standard for assessing techniques.

How Artificial Intelligence can help predict protein structures more accurately?

Over the past few decades, Scientist got some success in predicting protein structures using experimental lab methods. i.e. cryo-electron microscopy, nuclear magnetic resonance or X-ray crystallography. But all of such techniques years of research and development, thousand of dolors of funding and higher expertise with cost of multiple trials and errors.

This is why biologists are turning to AI and bioinformatics methods as an alternative to this long and laborious process for difficult proteins. Fortunately, the field of genomics is quite rich in data thanks to the rapid reduction in the cost of genetic sequencing. As a result, deep learning approaches to the prediction problem that rely on genomic data have become increasingly popular in the last few years.

DeepMinds’s AlphaFold has outperformed in progress and accuracy to predict complex protein structures as compared to other methods of protein structure prediction. Which was also submitted to CASP — ranked at top in ranking, focused on specifically on the hard problem of modeling target shapes from scratch, without using previously solved proteins as templates — achieving a high degree of accuracy when predicting the physical properties of a protein structure, and then used two distinct methods to construct predictions of full protein structures.

Both of these methods relied on deep neural networks that are trained to predict properties of the protein from its genetic sequence. First method is to predict the distances between pairs of amino acids and second method predicted the angles between chemical bonds that connect those amino acids. The first development is an advance on commonly used techniques that estimate whether pairs of amino acids are near each other.

Neural network predicted a separate distribution of distances between every pair of residues in a protein. These probabilities were then combined into a score that estimated the accuracy of predicted protein structure. Another neural network was also trained that uses all distances in aggregate to estimate how close the proposed structure is to the right answer.

First method built on techniques commonly used in structural biology, and repeatedly replaced pieces of a protein structure with new protein fragments. A generative neural network was trained to invent new fragments, which were used to continually improve the score of the proposed protein structure.

The second method optimized scores through gradient descent — a mathematical technique commonly used in machine learning for making small, incremental improvements — which resulted in highly accurate structures. This technique was applied to entire protein chains rather than to pieces that must be folded separately before being assembled, reducing the complexity of the prediction process.

Future promises

Well this is just a first step into the biological scientific discovery, a lot more work needed to be done to successfully predict the exact protein structure. Which will change the whole biological history, making us able to have a quantifiable impact on treating diseases, managing the environment, and more.

This article is written on the basis of research by DeepMind’s scientist. Below is research paper published by DeepMind’s team. (Cited).

De novo structure prediction with deep-learning based scoring
R.Evans, J.Jumper, J.Kirkpatrick, L.Sifre, T.F.G.Green, C.Qin, A.Zidek, A.Nelson, A.Bridgland, H.Penedones, S.Petersen, K.Simonyan, S.Crossan, D.T.Jones, D.Silver, K.Kavukcuoglu, D.Hassabis, A.W.Senior
In Thirteenth Critical Assessment of Techniques for Protein Structure Prediction (Abstracts) 1–4 December 2018.

If you like this post, give it a ❤️ below so others may see it. Thank you!

--

--

HAMZA ABDULLAH
THE 21st CENTURY

Driven by a futuristically optimistic vision, I am dedicated to transforming society through innovation, striving to become a Type 1 civilization.