DeepMind Has Not Solved The Protein Folding Problem Yet

But Artificial Intelligence is Revolutionizing Research

Angela Wilkins
Rice Ken Kennedy Institute
4 min readAug 14, 2021

--

Image from Emily Morter

Physics is the study of the universe. This can encompass the movement of the galaxies to the “how” of the smallest particles. No matter the shape and size of the problem, data has always been a constant.

In the early days of physics, mathematical models were painstakingly written out and often solved by hand. I think of Galileo recognizing that the earth revolves around the sun. I think of Einstein making the jump to general relativity. There was data but it was sparse. Even so, they found the theory (or model) that worked and could explain the data.

When models stop explaining the data, we need new models. This could be a small adjustment (e.g., Perturbation Theory) or a completely new theory of everything. We vary a few parameters to fit the data that will offer the insight we covet. We only accept these new more complex models when the results are irrefutable.

In comparison to even twenty years ago, we create and collect immense amounts of data. Machine learning and artificial intelligence can be used to find the shape of our new unconstrained model.

We no longer have to grow our models slowly from intuition and small jumps in logic.

Today, deep learning architectures do not just give researchers the ability to model and compute complex problems with far greater speed and accuracy than ever before — but provide opportunities to find new physics.

AI Partially Solves Protein Folding

For decades, researchers have worked toward creating a model to represent how protein structures fold. A scalable solution would have major impacts in the healthcare sector. Yet, the complexity inherent to protein structures has meant this problem evaded the grasp of models, until recently.

Learn more about proteins here

Protein structures are large dynamic molecules that dictate how our cells function. The protein fold depends on many thermodynamic and kinetic factors that are impossible to fully account for (at least so far). This has been on going challenge for the academic community for decades.

Last year, DeepMind (a British AI lab and subsidiary of Alphabet Inc, Google’s parent company) took the world of protein structure modeling by storm. DeepMind’s AlphaFold model cracked the 50 year old challenge at CASP, the bi-annual Critical Assessment of Protein Structure Prediction competition that aims to provide independent evaluation of algorithm design in protein structure modeling.

From the DeepMind team:

AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.

DeepMind made news not only for solving the grand challenge (outperforming competition in 25 of 43 cases, with the second-place finisher winning three out of the same 43 cases), but also because it was the non-specialist’s first time entering a competition typically populated by teams from labs dedicated to studying protein structures.

I was skeptical that DeepMind would take AlphaFold from proving a hard problem could be solved to scaling to a real useful result. However, last month DeepMind made a step in the right direction, saying it would use AlphaFold to release the structure of every protein known to science.

The database will provide researchers with a toolkit to improve the drug discovery process and better understand diseases. The implications of this work will reach far outside the world of AI.

Though the solutions (possible protein structures) that AlphaFold provides are potentially useful, they will also be incomplete. Determining protein structures is a tricky problem because they are incredibly dynamic. They often move about a cell, unfolding and refolding in different ways, binding with DNA, RNA, and other proteins.

I can’t help but ask what else is hiding in these deep learning models. Could we understand the spectrum of possible stable structures, not just the most likely one based on the data we have? Could we understand the mechanism of the movement between these stable structures? Can we better understand how the chemistry of the environment impacts the folding with the right data?Can we introduce small changes in the protein sequence to understand how and why the protein structure changes?

This could be key to understanding an important facet of disease and how to treat patients.

We are far from a complete solution. The level of accuracy of Deepmind’s AlphaFold has only created more questions and potential next steps. Will DeepMind continue? If they move on to the next problem, who will have the computational resources to take this on?

Or will this change how we work together for solutions? The CASP competition shows that many people across the world care deeply about the potential of protein folding. These teams will need to work differently (work together) to compete with the resources and the best of DeepMind. Could CASP be redesigned to bring about next generation of solutions?

Though DeepMind has not solved the protein folding problem (yet), Artificial Intelligence will change how we do science.

Join the Ken Kennedy Institute’s newsletter to stay up to date.

--

--

Angela Wilkins
Rice Ken Kennedy Institute

I like science, machine learning, technology, and start-ups.