Searching for binding pockets using PlayMolecule® DeepSite [TUTORIAL]

In this tutorial we will take a look at DeepSite, a web application to search for binding pockets using a 3D neural network-based predictor

Gerard Martínez
Nov 5 · 5 min read

Welcome to another PlayMolecule tutorial. At this time, we will look at one of the first and most popular applications — DeepSite. While the algorithm is thoroughly explained in the corresponding paper, in this article we will try to get an intuitive grasp and a practical approach to the application. Finally, we will demonstrate how to use the app by searching for pockets in the ABL Kinase (PDB code: 3CS9).


The algorithm

Deep learning has witnessed an amazing rate of development over the last few years leading to really mediatic successes in the field of machine vision. DeepSite leverages from the experience in the image recognition field and brings it to structural biology.

For deep learning to work, neural networks need to be trained with multiple examples (the more the merrier) of something we want to learn. In the classical “classification problem”, we want to present for instance, examples of two categories (e.g. photos of cats and photos that aren’t cats) and we want to the algorithm to be able to discern and correctly differentiate new examples of the category 1 from the category 2. DeepSite is not so different from this simple classifier, and what we do is that we take examples of binding pockets and non-binding pockets from the scPDB database and we train a neural network to correctly classify what looks like a binding pocket from what doesn’t look like one.

Figure 1. DeepSite is not so different from a image-based cat classifier.

Since our protein structures are tridimensional and we want to use as much information as possible, we do not use a simple pictures of the pockets: we actually use a box that contains the pocket. In practice, this means that we use a type of neural networks called 3D convolutional neural networks, which are different from the ones used for image recognition (they use 2D convolution instead because pictures are 2-dimensional).

Further, in image recognition, the information about color is encoded and inputted into the network in the form of 3 channels (red, green, blue) that combined can reconstruct the original image.

Figure 2. In image recognition, a picture is decoded into three channels of color: red, green, blue

In DeepSite, we use the same principle to encode the information of the protein structure into several channels, each of them taking care of a different chemical property such as aromaticity, hydrophobicity, H-bond donor, H-bond acceptor, occupancy… Here’s a complete list of all the channels we generate:

Figure 3. Channels generated by DeepSite to encode the structure into chemistry

And here is an example of how the hydrophobic and aromatic channels look for the PDBID 4NIE:

Figure 4. Hydrophobic (left) and aromatic (right) channels for the PDB ID 4NIE

This information is generated for 7622 proteins from the scPDB database for examples of boxes with “pocket” and “without pocket” and a 3D convolutional neural network (3D-CNN) is trained to differentiate these two classes.

Then, in inference time (i.e. when you submit your protein to the DeepSite application), we divide a protein into boxes and for each box we ask our pre-trained model: “is this a pocket?”. The model (3D-CNN) gives out a probability of being a pocket that ranges from 0 (not-pocket) to 1 (definitely-a-pocket). Finally, we aggregate this “probability in 3D space” into a so-called iso-surface, which is an interpolation of the different probabilities obtained from the per-box predictions. This isosurface helps us to highlight which area is more likely to be a pocket.


An example

DeepSite is very easy to use. To submit your protein, simply upload your PDB file or type a 4-letter RSCB PDB ID. In our case we will type “3CS9”, the ABL Kinase we are interested about. Finally we will select the chain “A”. The results will come up after a minute or so.

Figure 5. DeepSite results for the PDB ID 3CS9.

As you can see, the generated isosurface is depicted in orange and represents the 3D volume where the probability of finding a binding pocket is the highest. You can change the cutoff of the isosurface by clicking on the dented wheel on the top right corner. Further, to simplify the analysis, the isosurface is clusterized into few clusters (putative pockets), the centers of which are shown in the right panel.


Conclusions

DeepSite leverages the technology developed for machine vision and image recognition and brings is to structural biology to detect pockets. Examples of boxes containing pockets and without pockets are extracted from proteins from the scPDB database and are used to train a 3D convolutional neural network. In prediction mode, your protein is broken into equally-sized boxes and for each box a prediction is casted. The different probabilities “in 3D space” are aggregated into isosurfaces, which are then shown to the user in the graphical user interface (GUI).

The DeepSite tool is freely available at www.playmolecule.com using public resources. For private installation, please get in touch with Acellera by contacting us at info@acellera.com.

Thanks for reading!

PlayMolecule

PlayMolecule blog. News, tutorials and use cases of PlayMolecule.

Gerard Martínez

Written by

Manager and developer of the PlayMolecule drug discovery platform (https://playmolecule.com/)

PlayMolecule

PlayMolecule blog. News, tutorials and use cases of PlayMolecule.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade