Qiskit’s Protein Folding Module Has Moved — Here’s How to Use It
If you’re already familiar with solving protein folding problems with Qiskit, then you are probably aware of the ProteinFoldingProblem
class that was previously located in Qiskit Nature. This class will now be developed and maintained in Qiskit Research and will be removed from Qiskit Nature in the near future.
The Qiskit Research repository aims to gather research results implemented using Qiskit and run them on IBM Quantum hardware. That makes it a more intuitive space to maintain the Protein Folding Module, which originated as a proof-of-concept implementation of this paper [1].
Whether you’re an established user or about to explore your first protein folding problems with Qiskit, this blog aims to help you get started quickly with the new standalone package, as well as highlight some of the key features and opportunities for further enhancements.
Background
If you’re a computational bio-chemist reading this, then you probably already know what protein folding is and why it’s important, so feel free to skip ahead to the code-y bits. For the rest of you, strap in, dust off those hazy memories from high school biology and let’s set the scene:
Amino acids are the building blocks of life. They are small molecules that bond together to form polypeptide chains and the forces between their atoms cause that chain to ‘fold’ up into a protein. Proteins are incredibly important for basically every biological process, so every living organism has them and needs them if they want to stay, well, living. You have them, your cat has them, the bacteria and viruses inside you has them. The point is, whether you’re human, avian, bacterium or something in between, chances are you’ve got a bunch of proteins working their electrons off trying to keep you ticking.
The folded structure of a protein ultimately controls how it behaves and how other molecules interact with it, so understanding the most likely conformation can be vital for many areas of biochemical research. For example, the effectiveness of pharmaceutical drugs often relies on the very specific way that a drug molecule interacts with a certain protein, so understanding the shape of those proteins is essential.
Despite huge leaps in biomedical research over the last few decades and more recent computational advances in the field of AI, protein folding remains one of the most poorly understood domains in chemistry. There are just SO many different proteins out there, each with unique combinations of amino acids and hence unique natural folding structures. On top of that, because of the many ways that amino acids can interact with each other and the surrounding environment, a single polypeptide chain can have huge numbers of potential protein folding conformations, yet only 1 that actually occurs in nature (and maybe a few others if you include intermediates, mutations etc.). This is known as Levinthal’s Paradox, which states for example that a polypeptide with 100 amino acids can take about 10⁴⁷ different conformations. For context, average sizes of proteins can vary from 50 to 2000 amino acids in length, and the largest protein in the human body, Titin, has 34,350 amino acids. So yeah, that’s A LOT of potential protein structures to sort through and try to figure out which is the lowest energy (and hence the most likely to occur naturally).
Ok, by now you can probably see where we’re going with this. Protein folding is an incredibly hard problem to solve, and classical computers struggle to calculate structures for even small proteins. So of course, this a particularly active area of quantum computing research. As with all applications of quantum computing, we are limited by the hardware we currently have available, but research is still being done into approximate solutions in the meantime (for example using coarse-grained protein models instead of fully atomistic descriptions). Qiskit developers aim to design the software in the most modular and extensible way possible, to facilitate intermediate research as well as prepare for future developments, and the code’s new location in the Qiskit Research repo reflects this experimental format.
Using the Protein Folding Module — what’s changed?
For users, the only difference between using the new protein folding module in Qiskit Research versus how you used it before when it was in Qiskit Nature is that there is a slightly different installation step.
To use the protein folding module in Qiskit Research, you need to install the Qiskit Research repository:
git clone <https://github.com/qiskit-research/qiskit-research.git>
cd qiskit-research
pip install .
Then, when using the module in your code, instead of importing the protein folding classes from Qiskit Nature, you’ll need to import from Qiskit Research:
# Before, using qiskit-nature:
from qiskit_nature.problems.sampling.protein_folding.protein_folding_problem import (
ProteinFoldingProblem,
)# Now, using qiskit-research:
from qiskit_research.protein_folding.protein_folding_problem import (
ProteinFoldingProblem,
)
Apart from that, everything else about using the module is exactly the same! Read on for more details on what you can use this code for in your computational chemistry research
The Protein Folding Problem Class
The ProteinFoldingProblem
class allows computational chemists to construct a computational model of a (course-grained) protein folding Hamiltonian that takes into account the geometry, chirality and interaction energies of amino acids in a polypeptide chain. Users can then compute solutions to the problem using a Quantum Algorithm of their choice, as well as interpret and visualize the results.
An overview of how to use the class is as follows:
- Define the polypeptide — users can specify which amino acids make up the main chain and any side chains of the polypeptide they are interested in approximating the structure for.
- Define interactions between amino acids — the package provides a few interaction classes defining different types of contact maps, including
MixedInteraction
,RandomInteraction
andMiyazawaJerniganInteraction
. Users can also create their own custom contact map classes. - Define physical constraints (i.e. penalty terms) — users can define penalty values for different physical constraints that apply to the polypeptide. Constraints can include chirality, nearest-neighbor interactions, and preventing the chain from folding back on itself.
- Instantiate the ProteinFoldingProblem class with the peptide, interactions, and constraints.
- Retrieve the qubit operator — i.e. convert the molecule information into a format that can be used with quantum algorithms.
- Run preferred algorithm and optimiser (VQE, QAOA etc.) to get the lowest energy solution.
- NEW! use the new
.interpret()
function to format the results from the algorithm and get other useful data, including:
- the bitstring used during the algorithm optimization
- the expanded expression of the result as a binary vector
- the sequence of turns in the polypeptide main and side chains
- a protein shape file with Cartesian coordinates in xyz format for each amino acid in the polypeptide (this is a common data structure in computational chemistry that can be used with other classical chemistry software)
8. NEW! plot a simple visualization of the predicted protein structure
You can see a full walkthrough of how to use the ProblemFoldingProblem
class in this Protein Folding Tutorial, but the following pseudo code gives you a sense for how simple and customizable the code is:
# 1. Define the polypeptide
peptide = Peptide(main_chain, side_chains)# 2. Define the amino acid interactions
interaction = MiyazawaJerniganInteraction()# 3. Define the physical constraints
penalty_terms = PenaltyParameters(chiral, fold_back, overlap)# 4. Instantiate the ProteinFoldingProblem Class
protein_folding_problem = ProteinFoldingProblem(peptide, interaction, penalty_terms)qubit_op = protein_folding_problem.qubit_op()# 5. do some quantum algorithm to find lowest energy result,
# e.g. CVaRVQE# result = ...# 6. Interpret the result to get other useful data
protein_data = protein_folding_problem.interpret(raw_result=result)
protein_data.turn_sequence
protein_data.get_result_binary_vector()
protein_data.protein_shape_decoder.main_turns
protein_data.protein_shape_decoder.side_turns
protein_data.protein_shape_file_gen.get_xyz_data()# 7. plot the results to create a visual representation of protein folding:
fig = protein_data.get_figure(title="Protein Structure", ticks=False, grid=True)
fig.get_axes()[0].view_init(10, 70)
These are the types of plots you can now create (which are also rotatable when viewed in jupyter notebook format):
Further work
Quantum computing solutions for protein folding is an active area of research and there is still a long way to go (with both hardware and software) before we see any real advantages for fields such as pharmaceutical development. The Protein Folding module has been developed in a deliberately open source, modular, and extensible way so as to encourage more collaboration from experts in the field, in order to continue improving the software for the benefit of future users.
If you are a computational chemist actively working in this area, or are keen to contribute to the development of this project, there are few key areas you can focus on:
Develop custom interaction models
Currently the protein folding module has a few out-of-the-box interaction classes for approximating interactions between amino acids, but users can also develop their own based on the Interaction
base class.
Experiment with scaling up the size of the problem
The examples above and in the tutorial are deliberately simple to enable users to get started quickly, but we encourage users to experiment with larger proteins, more complex contact maps etc. and then publish the results for the benefit of the community.
Report bugs and request features
Is something not working as it should? Is the module missing something particularly useful? Please let us know by opening a bug report or feature request issue in GitHub here.
Contribute to the open source library
If you want to get involved by working directly on the protein folding codebase (or other Qiskit Research modules!) you can take a look at the open issues here and get sucked in (but remember to read the contributing guidelines first 😉 ).
So there you have it! You now have all you need to know to get started researching protein folding using the new(ish) Qiskit protein folding module! Happy coding Qiskitters!
References:
[1] A.Robert, P.Barkoutsos, S.Woerner and I.Tavernelli, Resource-efficient quantum algorithm for protein folding, NPJ Quantum Information, 2021, https://doi.org/10.1038/s41534-021-00368-4