How Deep Learning solved the Mystery of Biology

This story is a history about development Deep Learning tools, that helped people predict three-dimensional structure of proteins.

Dzmitry Hramyka
Axioma AI Journal
4 min readJun 13, 2022

--

Since the beginning of the second half of the 20th century, a new acute problem has arisen — to predict the 3-D structure of a protein, knowing only its sequence (that is, the primary structure). The problem is very complicated because there are too many factors affecting the results. In addition, the difficulty lay in our accumulated knowledge — in the 90s there were only 700 records in the PDB (Protein data bank), which is negligible for any analyses. And the combination of many other factors has made this puzzle of Computational Biology practically unsolvable.

The biological concept

Actually what are this proteins and their structures?

It’s no secret that each cell consists of different molecules, and if we remove all the water from the cell, then 50% of the remainder will be proteins. In fact, proteins are almost everything, thanks to their extraordinary plasticity and incredible functionality. If you want to imagine any active process taking place in a cell, then some protein will surely participate in it. For example, the movement of cells is only a continuous reduction of protein ribbons, and the process of eating large particles or even small microorganisms (bacteria or viruses) is only the capture and enveloping of this compartment by bending proteins.

Dinein movement by .

However, all this non-imaginary functionality of proteins is available only thanks to the three-dimensional structure of polypeptides. The process by which the polypeptide chain is folded, turning into a biologically active protein in its natural three-dimensional structure, is called protein folding. This is a very complex process in which the protein is transferred to completely different areas of the cell, it is modified by other proteins, and even in different environments and under different conditions, this process does not differ. In fact, this is still not a completely studied process, over which science will still have to break its head.

In fact, proteins have 4 different folding stages: Primary, Secondary, Tertiary and Quaternary. The primary structure of a protein, its linear amino acid sequence, determines its native conformation. The formation of a secondary structure is the first step in the folding process. A characteristic feature of the secondary structure are structures known as alpha helices and beta sheets, which fold rapidly. With the secondary structure, hydrophobic amino acid residues also remain, which form the tertiary structure of the protein. Also some folded proteins (in the tertiary structure) they are able to unite with each other in an aggregate structure — quaternary.

Structures of proteins

The computational solution

Machine Learning implementation

At the beginning of the 21st century, some smart people thought why not try to solve this issue with computers. In the next 10 years, the idea developed so well that in 2015 the first attempts to use machine learning models for data analysis and forecasting began. But the turning point was made in 2020.

Alphabet’s/Google’s DeepMind DNN (Deep Neural Network) has started building a protein folding model. In 2018, the model took 13th place in the overall standings of the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competition, and in 2020, this model has already taken first place in predicting the three-dimensional structure of protein. It was a real revolution, because DeepMind significantly outperformed everyone, scoring more than 90 points for about two-thirds of the proteins in the global CASP Remote Test (PVE).

AlphaFold Protein Structure Database

On July 15, 2021, the AlphaFold2 paper was published in Nature as a pre-access publication along with open source software and a searchable database of species proteomes. Currently, the main protein databases (Uniprot, swissprot, PDB) use structures predicted by AlfaFold2, along with protein structures obtained by old classical methods. It is worth noting that over the past 5 years, many other deep neural networks have also appeared that perform the same task, and that simply shows, that this concept definetly works!

Finally

I want to say that if you are interested in this topic, you can read my other articles related to machine learning in bioinformatics.

This 50-year-old grand challenge was brilliantly solved with the help of technology and science. This is a great example of how technology and ideas should work in our world.

Be human, do science 🕊

🔔 Loved this Article & Want more?
📩 Feel free to follow and subscribe to my newsletter.

🔍 New in Medium?
📌 Join the largest community!

🔍 Interested in Science and Bioinformatics particularly?
📌 View my other Articles.

❓ Have questions?
✅ Feel free to contact me on:
🔘 Linkedin
🔘 Twitter

--

--

Dzmitry Hramyka
Axioma AI Journal

I am research student in Bioinformatics/Molecular Biology. Highly interested in AI/ML/Technology. Love make tools for humans and share my opinion here.