Open source drug discovery for novel antimalarials

Ersilia participates in the Open Source Malaria project

Miquel Duran-Frigola
ersiliaio

--

In 2019, in a funny but very serious ChemMedChem editorial article, Matthew Todd from University College London presented his “Six Laws for Open Source Drug Discovery”. The laws are the following:

  1. The Condensed Law
  2. The Freedom Law
  3. No Patents
  4. No Assholes
  5. No Emails
  6. The Under a Bus Law

Which means:

  1. All Data and Ideas Are Freely Shared
  2. Anyone May Participate at Any Level
  3. There Will Be No Patents
  4. Suggestions Are the Best Form of Criticism
  5. Public Discussion Is More Valuable than Private Email
  6. An Open Project is Bigger Than, and Is not Owned by, Any Given Lab

Todd’s article is worth a read, fresh and exceptionally well written — I am not going to spoil it here. If you have five minutes, check it out. “Society needs effective and affordable medicines”, Todd begins. “We currently have at our disposal essentially one system to discover and develop drugs, and there are many areas where this system struggles to deliver, for example to combat antimicrobial resistance, or tropical diseases, or dementia. It is sensible to cultivate alternative, competing approaches to drug discovery and development. A genuinely new alternative is to open up the entire research cycle, abandoning secrecy altogether.”

In other words, Todd is accepting the fact that pharmaceutical companies will not develop medicines when the return on investment is low, and puts forward an alternative strategy based on collaborative science. I believe in this alternative — this is why we founded Ersilia in the first place. As it happens, Todd has fostered initiatives like Open Source Malaria, Open Source Tuberculosis, Open Source Mycetoma and Open Source Antibiotics, and even a private company, M4ID Pharma, having a business model entirely based on open assets.

The Series 4 antimalarial compounds

The Open Source Malaria (OSM) initiative, in particular, has made visible progress in the last few years. In (again) a worthy article by Todd and coworkers, a specific family of compounds, named “Series 4”, was examined with crowd-sourced computational methods. Currently, the OSM project is in a hit-to-lead optimization stage, aimed at maximizing the antimalarial potency and drug-likeness of Series 4 compounds in order to advance towards clinical trials.

Some Series 4 compounds. Adapted from Tse et al, 2020

The unimaginable chemistry of generative models

At Ersilia, we decided to contribute to this task. We collected the historical Series 4 experimental data against the malaria parasite (Plasmodium falciparum), and outlined a strategy to generate new molecules with improved properties. We are in the deep learning era, so we chose to use deep learning.

Now, there is a lot of hype about deep generative models, especially for text and image generation. As you may remember, back in 2018 the Christie’s auction house sold the Portrait of Edmond Belamy for 432,500 USD. You can judge the quality of the picture:

The Portrait of Edmond Belamy is part of the Belamy Family series. Source: Wikipedia

This artwork was fully generated by an AI (note the author’s signature (a formula) at the bottom-right of the canvas) trained on 15,000 images downloaded from WikiArt. Things have improved dramatically since 2018, but the message remains: generation of creative and convincing solutions to any given problem is difficult. No free lunch, not even in the era of AI.

Chemistry spells an extremely rich language, involving multiple atom types and bonds between them. Indeed, the number of possible small molecules is unthinkable — setting up a maximum of 30 atoms, and considering only Carbon, Hydrogen, Oxygen, Nitrogen and Sulfur (among other restrictions), arrives at an estimate of 10⁶³ compounds. One can only remember the short story by the Argentinian writer Jorge Luis Borges, The Library of Babel. “The Universe (which others call the Library)”, the story begins, “is composed of an indefinite, perhaps infinite number of hexagonal galleries.” Then it goes on to describe the last words of a solitary librarian lost in this enduring, unattainable and essentially random world of books. As a computational chemist, challenged to discover new and valid molecules, I feel the same kind of desperation.

Library of Babel. You can navigate it virtually. Source: That Explains Things

The good news was that, in the OSM Series 4 exercise, the chemical space was delimited around a core triazolopyrazine scaffold with substituents in the top-left (North-West) and top-right (North-East), which simplified the search a lot. The other good news was that OSM has a lively discussion forum, and it is always good to have a chemistry practitioner next to you when you are generating new molecules, the same way it is good to have somebody who can read if you are generating new text.

Large-scale production of Series 4 candidates

We worked on this small molecule generation task for five intensive days, during a retreat at the Centre d’Art i Natura in the Catalan Pyrenees. We can’t complain:

The village of Farrera. Source: CAN

A more detailed explanation, data, code and results of the analysis are available in this GitHub repository. Correspondingly, we opened a discussion stream (#34) in the OSM forum. Long story short: using a reinforcement learning framework, we devised a set of “agents” aimed at evaluating the antimalarial activity of the virtual molecules, as well as their synthetic accessibility and drug-like properties. As a result, we produced over 100,000 Series 4 derivatives, of which we selected a diverse subset of 1,000 candidates.

TMAP representation of generated Series 4 compounds. Source: EOSI

If you are curious about it, feel free to navigate the Series 4 chemical space, zoom in and out, and perhaps focus on the highly active (red) regions of the map. The job is not done, we are well aware. In the coming weeks, we will improve upon our findings, narrowing down the list of candidates and possibly generating a few more. But let me share my preliminary excitement: it feels good to contribute to an open science project, especially now that we work as independent drug discovery researchers — not in academia but not in the industry, either. The Six Laws of Open Source Drug Discovery set a place in the field for nonprofits like ours.

--

--

Miquel Duran-Frigola
ersiliaio

Computational pharmacologist with an interest in global health. Lead Scientist and Founder at Ersilia Open Source Initiative. Occasional fiction writer.