Computational approaches streamlining drug discovery

Mykola Protopopov
5 min readApr 20, 2024

--

Despite the thorny path of development, approbation, and implementation with a duration of more than 50 years, computational approaches have become the driving force of drug discovery in both academia and industry. Today, these methods help to find not only unique lead compounds but are also able to speed up and reduce the cost of clinical research by selecting ligands with obviously known optimal ADMET and PK profiles. Some companies are already claiming success in overcoming the path target-to-clinic in 1 year. This is facilitated by: the abundance of template 3D structures of molecular targets, the constant growth of virtual libraries and chemical spaces of drug-like compounds that are easily synthesized on demand, as well as generative spaces with theoretically predicted synthesizability. The appearance of new computational approaches for the virtual screening of super-large libraries and the use of artificial intelligence (AI) plays also a significant role here.

Today virtual on-demand libraries come to the fore. It has been shown that the use of them in in silico screening is more cost-effective than applying of the physical libraries. Enamine’s fully enumerated REAL database (more than 5.5 billion compounds in 2022) uses carefully selected and optimized parallel synthesis protocols and a curated collection of in-stock building blocks, allowing fast (˂ 4 weeks), reliable (80% success rate) and affordable synthesis of a set of compounds and it is a good example of such libraries. When the size of these ones begins to exceed several billion compounds its are turning into virtual spaces. The indisputable advantage of these spaces is their high novelty and diversity with minimum overlap with each other is ˂10% wherein synthetic speed, rate and cost guarantees is the same maintained with virtual on-demand libraries. For example, the largest commercial space Enamine REAL Space (36 billion compounds) can be easily expanded to 10 15 compounds. As an alternative approach to creating chemical spaces uses the generation of compounds that can be hypothetically synthesized by the rules of synthetic feasibility and chemical stability, and generative approaches based on narrower definitions of chemical spaces that use deep learning (DL) based generative chemistry for de novo ligand design.

A classic approach to search for new lead compounds, which has proven itself in practice, is receptor structure-based screening. The resulting sets of candidate ligands show good hit rates (10–40 %) in experimental studies. But with increasing library sizes to gigascale and terrascale, the computational time and cost of docking itself become the strong limitation of this process, even with massively parallel cloud computing and even with recently suggested iterative approaches, despite the achievement of an improved speed several-fold.

Given the bottlenecks of the above-mentioned methods, the modular synthon-based approaches become promising. Thus, in 2022, the virtual synthon hierarchical enumeration screening (V-SYNTHES) technology was developed. It applies fragment-based design to on-demand chemical spaces, thus avoiding the challenges of custom synthesis. The V-SYNTHES algorithm consists of several stages: 1) a minimal library of representative chemical fragments by fully enumerating synthons at one of the attachment points, capping the other position (or positions) is closed with methyl or phenyl groups; 2) docking-based screening is performed and top-scoring fragments are predicted to bind well into the target pocket; 3) stages 1 and 2 are iterated step by step for each of the following positions in the ligand molecule; 4) dock and score full compounds with more elaborate and accurate docking parameters or methods is performed and the top-ranking candidates are filtered for novelty, diversity and variety of desired drug-like properties. As a result, about 30,000–50,000 compounds are selected for the next stage; 5) in the post-processing stage compounds are screened for drug-likeness, PAINS, diversity, and so on, filters and the best 50–500 candidates with the highest rating are selected for synthesis and testing in vitro. It has been shown that V-SYNTHES has repeatedly proven its effectiveness in practice and exceeded the hit rate for submicromolar ligands obtained by standard virtual ligand screening by fivefold, while taking about 100 times less computational resources. Other benefits of modular synthon-based approaches are particularly effective for identifying matches with high chemical novelty, provide a significant SAR catalog for better results, and can greatly simplify the next steps to optimize potential leads thus reducing extensive custom synthesis.

Taking into account that now is the era of the heyday of AI, there is a great interest in applications of data-driven DL approaches for drug discovery. Wherein machine learning algorithms are already extensively used to predict ligand properties and on-target activities, albeit with ambiguous results. In turn, the advent of DL is aspected to take data-driven models to the next level, allowing analysis of much larger and diverse datasets while deriving more complicated non-linear relationships in the drug discovery process. However, as practice shows, DL models have a strong dependence on the training data and they are prone to overtraining and spurious performance, sometimes leading to whole classes of models deemed ‘useless’ or severely biased by subjective factors defining the training dataset. Despite challenges, AI is already starting to make a substantial effect on drug discovery, with the first AI-based drug candidates making it into the preclinical and clinical studies.

Due to a number of limitations in physics-based and data-driven approaches in predicting ligands potency, it has been making numerous efforts to create new hybrid computational approaches. It is expected that this will reduce the false positive rates and improve the quality of hits. Today, hybrid iterative approaches, in which results of structure-based docking of a sparse library subset are used to train machine learning models, which are then used to filter the whole library to further reduce its size is suggested. These methods report as much as 14–100 reduction in the computational cost for libraries of 1.4 billion compounds, although it is not still clear how they would scale to rapidly growing chemical spaces.

Shortly further growth of virtual chemical space, optimization and development of new computational approaches, expansion of the potential of automated chemical design concepts, involvement of robotic synthesis, building hybrid in silico-in vitro pipelines with easy access to the enormous on-demand chemical space at all stages of the gene-to-lead process, etc., in drug discovery process remains to be seen. But now V-SYNTHES remains the most optimal approach for the search of lead compounds in terms of time and economic costs.

The leading experts of Chemspace have a strong background in applying of virtual synthon hierarchical enumeration screening approach for search lead compounds. By contacting us, you are guaranteed to get the most effective results, which will increase your chances of success in drug discovery and development.

The full version of the article is available here.

--

--