Protein Structure Prediction : A Primer (Part 0)

Siddhant Rai
5 min readJan 4, 2024

--

What is structure prediction? How is it helpful?🤔

In my previous blog, I mentioned about a primer on structure prediction🧬, this very blog marks the beginning of the series, not very sure, how many iterations it would take to cover most of the things, but, it’s going to be fun. 🎉

In general context, structure prediction is simply prediction of 3D structure from given amino acid sequence.

This 3D structure is composed of two things, a secondary structure and tertiary structure, whereas amino acid sequence is called primary structure.
Hence, technically, structure prediction refers to finding out secondary and tertiary structure from primary structure.

Fig 1: Protein folding visualization

Now, let’s start diving deeper. Before, answering the details on primary, secondary and tertiary structures, let’s understand what do we even mean by amino acids, where does protein comes into play, how are structures actually composed, and more importantly what makes them fold.

Amino acids (Primer)
1. They are monomer (basic structures) with 3 distinct parts:
— Amino group
— Carboxyl group
— R-group (Side chain)
2. Amino and carboxyl groups form the backbone, whereas R-groups are attached on sides, like motifs.
3. Over 500 amino acids are known, out of which 22 alpha amino acids are identified as important and are found in genetic code of life. 12 of these are produced by the body, 9 are required to be consumed (also referred as essential amino acids).
4. Example : Arginine, Leucine (found in rice).

Fig 2: Basic structure of a Amino acid
Fig 3: Types of α-amino acids

Protein (Primer)🧩
1. Long chains of these amino acids residues is called Protein. They are connected across the amino and carboxyl backbone.
2. The sequence or order of amino acids is defined by our nucleotides (ATCG) [nucleotides are defined from our genetic makeup], this arrangement results in specific bond formations and eventually results into folding across torsional angles (theta and phi).
3. The chain is called polypeptide and the bonding between amino acid residues is through peptide bonds.

Protien : Amino acid chain (find meaning of Phe, Leu, Ser and Cys in Fig3)
Nucleotide

Now, as we roughly know what protein is, let’s see what makes it fold into secondary and tertiary structure.

The backbone which we saw (amino and Carboxyl) try to bond with other amino acid molecule through hydrogen bond, while combining together, the order in which the amino acid groups are attached results in formation of two significant secondary structure, alpha-helics (looks like helix) and beta-sheet (looks like paper folded at multiple places); see fig5. This is mostly a intermediate step, as eventually the structure folds into more complex tertiary structure.

Fig 4: Common types of secondary structure
Fig 5: Bond formation in secondary structure

The tertiary structure is low energy state/stable state, it’s the major problem we are trying to solve through Alphafold and similar architectures, but, how does it form? You remember the R-groups in amino acids, there R could be anything (a phosphate, etc.), based on that they inhibit properties like hydrophilic or hydrophobic, ionic bonds, hydrogen bonds, van-der walls attractions, etc., which makes them move to the inside part of protein which happens if we have depth(3D), or move onto surface of protein, these properties induce significant torsional effect on protein structure, forcing it to twist and bend in 3D, which eventually leads to the functionalities of the protein.

Fig 6

Now, beyond this there exists another structure called Quaternary structure, which roughly is the aggregation of multiple tertiary structures (see fig7)

Fig 7 : Quaternary structure

So much discussion on protein folding, What does shape of protein actually tells and how does it even corresponds to its functionalities?

The shape of protein results in a specific structure which eventually fits onto different things like enzymes, other proteins, etc. Hence, performing different specialized tasks. This is called as docking.

Fig 8 : Docking

Let’s take a small example to understand this, a good protein lives peacefully of shape “(“, it’s task is to collect oxygen of shape “)” and transport to legs, someday a bad viral protein comes up of shape “)” and fits onto the good protein, can our good protein still carry oxygen? No, what if this happens to all similar good proteins? Legs would be short of oxygen and might stop working, messed up right?, now, let’s have a cop protein of specialized shape “D(“ where D is a special signal to immune system, the cop would get attached to viral protein and render it harmless, which eventually would be finished by our immune system. See how important shapes are. Now this D could be a lot of things (check out CRISPR, I will cover it though).

Protein-protein docking

That’s all for today folks. What we covered in this post : What is structure prediction? Primer on Amino acid and Protein? Types of Protein structure during folding? How does knowing the shape of protein helps in bigger picture?

Lets discuss further topics in continuation in next blog. Follow me for more insights on Structure prediction and other topics. Adios till then. Thank you.

Happy learning!!

--

--

Siddhant Rai

Philomath, Research Engineer - Machine learning @Siemens. A simple human trying to understand other humans and machines.