AI Unfolds the Science Behind a 50-Year-Old Protein-Folding Problem

Hopefully, you didn’t bet on this taking longer than the age of the Universe to solve.

Madison Hunter
Predict
6 min readDec 9, 2020

--

Photo by Soheb Zaidi on Unsplash

Levinthal’s paradox states that determining the “native folded state of a protein by a random search among all possible configurations can take an enormously long time.” So it makes sense that scientists believed it would take longer than the age of the Universe to understand how proteins achieve their three-dimensional structure.

However, CASP (Critical Assessment of Protein Structure Prediction), an experiment that began in the 1990s, sought to carry out what hadn’t been accomplished in the 50 years since scientists first attempted to predict how proteins folded into their characteristic structure. CASP challenged scientists to devise a method for predicting protein folding.

The intricacy of proteins.

Proteins have unique, intricate, three-dimensional shapes that define what they do and how they function. To date, over 200 million proteins have been discovered, yet the shape of only a fraction of those is known. When unraveled from their three-dimensional shape, you can see that proteins are comprised of 20 different amino acids. These amino acids interact with each other causing the protein to fold into its final shape. There are almost limitless shapes that a protein can fold into, resulting in countless combinations that would take centuries to understand. Understanding the structure of proteins is vital for understanding life’s basic processes.

DeepMind’s AlphaFold.

Three decades after CASP issued its challenge to the scientific community, a most promising solution has surfaced using AI to predict the shapes of protein structures with accuracy that has yet to be seen. DeepMind, a company devoted to developing artificial intelligence systems to solve intelligence and advance scientific discovery, partnered with CASP to solve biology’s grand challenge. AlphaFold is DeepMind’s deep-learning system that has been proven to “accurately predict the structure of proteins to within the width of an atom”.

AlphaFold was trained to analyze the structure of proteins using a databank of roughly 170,000 protein structures. During testing, AlphaFold achieved an average score of 92.4 GDT (Global Distance Test) in its predictions. The competitive threshold score: 90 GDT. This means that AlphaFold’s abilities are easily competitive with the results obtained using experimental methods, such as cryo-electron microscopy, nuclear magnetic resonance, and x-ray crystallography. These experimental methods can take years to determine the shape of a protein. In contrast, AlphaFold can manage it in a few days. Achieving a score of over 90 reveals that any difference in the predicted and actual structure could be from experimental error in the experimental method — not the software. Alternatively, an error could predict an alternative protein configuration that could occur due to natural variation. The global distance test the results are based on is a scale of 0 to 100 which dictates how close the AI predicted structure is to the actual shape of the protein as determined from experimental methods.

AlphaFold was able to predict the structure of proteins with a margin of error of 1.6 angstroms (0.16 nanometers). This is equivalent to roughly the width of an atom. AlphaFold was able to accomplish the unthinkable. While scientists can determine the sequence of amino acids that make up a protein, there are thousands (an understatement) of ways that a protein can fold to achieve a three-dimensional shape. Back in 1972, Christian Anfinsen won the Nobel Prize in chemistry for proving that protein shape is determined by the sequence of amino acids. However, for the next 50 or so years, the scientific community had been grappling with trying to understand the shapes that proteins could fold into.

According to DeepMind, their latest version of AlphaFold used in the recent CASP experiment was created using an attention-based neural network system. The system was trained end-to-end to be able to interpret the structure of a spacial graph which can be used to predict the shape of a folded protein. The system then goes on to reason over the implicit graph that it is building. AlphaFold “uses evolutionarily related sequences, multiple sequence alignment (MSA), and a representation of amino acid residue pairs to refine this graph.” This iterative process allowed AlphaFold to develop solid predictions about the shape of a protein in as little as a few days. Finally, an internal confidence method in AlphaFold is used to determine which parts of the predicted protein shape are reliable.

AlphaFold’s ability to predict the shape of proteins has given new life to the study of diseases, something that has been thrust into the forefront of biology in the last year with a new urgency. DeepMind’s technology could help virologists identify proteins that have become impaired to help understand how they interact in this new malfunctioning state. Success in this field could lead to more precise drug development, further insight into existing experimental methods, and hopefully the realization of more efficient treatments for a variety of diseases and viruses.

DeepMind has even had the chance to use their AI in a real-world situation, having aided in early studies on the protein structures of the SARS-CoV-2 virus. AlphaFold was able to identify several related protein structures that had previously been unknown, helping scientists to better understand the nature of the virus.

DeepMind is currently in the process of preparing a paper concerning their system that will be submitted to a peer-reviewed journal.

The future of research using artificial intelligence.

The societal and economic benefits of using AI to conduct research are significant and undisputed. If AlphaFold’s success in taking a few days to solve a biological problem that would have otherwise taken centuries to complete is any indication, the future of AI in research is bright.

The availability of massive data sets paired with sophisticated algorithms and powerful computers has led to the supremacy of AI technologies in many fields of research.

In the future, AI will be instrumental in understanding the effects of climate change and will be able to simulate the results of climate change counterstrikes. Climate research requires the ability to comb through vast amounts of datasets while simulating complex climate models. To be able to inform decisions and policy-making at a local and federal level, climate models must be understood from a perspective of consequence. Climate models powered by AI will be able to simulate the predicted outcomes of a 2°C increase in global temperature, or the outcomes of environmental degradation caused by below-board mining operations. The pairing of research and AI to better understand and predict climate change will help improve global climate resiliency and may even be a mitigating factor in the fight against rising global temperatures and intensifying natural disasters.

Artificial intelligence will also continue to play a leading role in the search for Earth-sized planets orbiting distant stars. Currently, the Kepler mission is doing just that. The mission collects tons of data which can sometimes be distorted by the activity of onboard thrusters, by variations in planetary or stellar activity, or by systematic issues. Using AI and machine learning, data collected from the mission is cleaned before it enters the final analysis, allowing scientists to clearly interpret the information. It doesn’t stop there. AI and machine learning have also been instrumental in discovering new pulsars from existing data, discovering the properties of stars and supernovae, and classifying galaxies.

The ability of AI to revolutionize research doesn’t stop there though. AI and machine learning will continue to improve and facilitate research in fields such as conservation, organic chemistry, nanotechnology, astrophysics, medicine, and the social and historical sciences.

If DeepMind has proven anything with AlphaFold, it’s that the limits to what we can ask artificial intelligence to solve are nearly nonexistent. Scientists will no longer have to spend their careers hitting their head against a wall as an unsolvable problem they dedicated their lives to solving, passes them by for the next generation to take on.

Society will be allowed to dream that a cure for cancer may be possible, or that we will be able to preserve our planet for generations to come.

The limits to which AI can solve problems and discover new possibilities are only bounded by our own dreams for a safer, healthier, more knowledgeable world.

Discovering the solution to a 50-year-old protein-folding problem is just the beginning.

--

--

Madison Hunter
Predict

CAN | +1M views | Data Science, Programming & Learning | TerraBytes Newsletter: https://terrabytes.substack.com/