Modelling Drug Discovery as a Game of Sudoku 💊🔬

An analysis of AI’s potential in Drug Discovery processes.

Vinaya Sharma
11 min readOct 22, 2023

In Sudoku, every row, column, and region must contain the numbers 1–9. You must logically and iteratively place numbers to adhere to the game's rules.

Drug discovery isn’t much different than Sudoku, scientists are finding solutions to complex medical problems by identifying compounds to interact with biological targets to treat and manage diseases. Similarly based on the logical reasoning of scientific principles, and constrained by safety and efficacy standards, drug discovery is a game, just the difficulty level is set near “impossible”.

In the game of drug discovery, the goal is to develop new medication to treat, prevent, or manage diseases. 🎯

This intricate process aims to alleviate human suffering by providing solutions for diseases with limited or no available treatments, such as cancer, infectious diseases, and chronic illnesses. Drug discovery also encompasses the creation of vaccines for disease prevention and has played a pivotal role in extending human lifespans and improving overall health.

But, the drug discovery process is long, complex, and costly 🛑

The drug discovery process takes over 10–15 years with an average cost of over $1–2 billion for each new drug to be approved for clinical use. The overall success rate of clinical drug development is low at just 10%-15%, and only 1 out of every 5,000 drugs make it to the market approval stage, with 90% of clinical drug development failing. Drug design is difficult, and so we are left with two options for addressing the hundreds of drugs we need to bring to market.

1. Rapidly searching and testing for more potential drugs.

2. Better optimization and screening for improved resource allocation.

1 out of every 5,000+ drugs make it to the market approval stage.

Over the years we’ve continuously improved drug discovery processes. But we still have a long way to go. 🕰 ️

A brief overview of drug discovery process advancements:

  • We first discovered plants and their medicinal properties through trial and error a very long time ago. Medicines documented on clay slabs in Nagpur date back 5,000 years. As science evolved through the 1800s, chemists learned how to isolate therapeutic substances to determine which molecules enabled the desired biological responses. That’s how Salicin’s pain-releasing properties in willow bark were found, and led to the creation of drugs like aspirin.
  • As we entered the 20th century, biologists figured out that microorganisms like bacteria were the leading cause of infectious diseases. Scientists then began researching natural remedies and synthetic compounds in hopes of cures. This led to discoveries like Penicillin, which was accidentally discovered by Alexander Fleming after noticing a mould-contaminated petri disk was resistant to growing staphylococci bacteria.
  • Penicillin quickly became known as World War II’s “miracle drug,” curing infectious diseases, saving millions of lives, and kickstarting the pharmaceutical industry. Advancements in high-throughput screening techniques and molecular biology have since enabled a more targeted search for specific compounds that could interact with biological targets, like enzymes and receptors.

These innovations have turned drug discovery into a multidisciplinary field combining chemistry, biology, pharmacology, and genetics. The drug discovery process itself has evolved into a complex and multifaceted journey that involves the identification, development, and testing of new medications to treat or prevent diseases.

Now today, when going from disease to approving a drug, this standardized process is typically followed 🕵️‍♀️

  1. Target identification: This involves pinpointing “targets”, in most cases proteins or genes that are responsible for the development of particular diseases. In the case of cancer cells, this step involves finding overexpressed proteins that contribute to the cell’s uncontrolled growth.
  2. Lead generation and optimization: Once you know your targets, you can develop drugs to interact and modulate their activity. These drugs are typical small chemical compounds that bind to the molecules to enhance (agonists) or inhibit (antagonists) its activity. Medicinal chemists are tasked with optimizing efficacy, safety and bioavailability.
  3. Preclinical testing: Promising drug candidates move on to in vitro (cell-based) and in vivo (animal-based) experiments. These are smaller studies that must detail information on dosing and toxicity levels.
  4. Clinical trials: During Phase 1 — safety and dosage in healthy patients are assessed. Phase 2 evaluates efficacy and side effects in patients with the target disease. Finally, Phase 3 conducts larger-scale trials on a diverse range of participants.
  5. Regulatory approval: After successful completion of clinical trials regulatory agencies like the FDA review data and decide whether to approve the drug for marketing.

TL;DR: Drug candidate “players” are chosen during target identification and “eliminated” during clinical trials, until a lucky few “winners” make it to the “end of the game” for regulatory approvals. 🎊

Unlike classic games like Sudoku though, where you have a perfect 9x9 search space where everything mathematically adds up, drug discovery is complex. According to Philip M. Kim, a professor of molecular genetics and computer science at U of T’s Faculty of Medicine, “For a standard-length protein of 100 amino-acids, there are 20¹⁰⁰ possible molecular structures, that’s more than the number of molecules in the universe.”. Add on the real-world variability, high costs, and lengthy experimentation processes, drug discovery makes Sudoku seem easy.

Standard drug discovery process today.

As you can see, the drug discovery game is quite cutthroat. But the great thing about treating drug discovery as a game is we can use powerups. ⭐️

While learning about the drug discovery process, the number of inefficiencies and potential for innovation just jumps out. I mean, according to the image above, why is it taking 7–10 years to choose our players, 6–12 years for them to play the games, and 1–2 years to be announced victorious? If we can streamline the drug development process, we could save millions of lives.

When thinking about automation and optimization for drug discovery, Artificial Intelligence is your best bet. 🧮

According to Insider, “AI could curb drug discovery costs for companies by as much as 70%”. The Information Technology & Innovation Foundation states the clinical research phase alone, could see savings of $28 billion per year with AI. And Pharmaceutical Technology shares, a typical four or five-year exploratory research phase could be condensed into less than a year with the help of AI.

AI-enabled drug discovery process.

Many biotech startups and research organizations have already been exploring and making major breakthroughs in this space.

DeepMind’s AlphaFold 2 (AF2) for example, predicted the three-dimensional structures of proteins with remarkable accuracy. 💉

With AI applied to drug discovery in this way, many standard processes can be simplified. In this case, the knowledge of three-dimensional structures of proteins and other biological molecules brought by AF2 enables the rational design of drugs, virtual screening for potential candidates, optimization of compounds, and a deeper understanding of drug-target interactions. This knowledge accelerates drug development, enhances specificity, and offers valuable insights into disease mechanisms.

The model behind AF2.

Aside from predicting protein structures, there are so many more applications of AI in drug discovery, and I’ve broken down some cool ideas in the image below.

My ideas for AI automation in the drug discovery process.

The idea that excites me the most is leveraging AI to generate novel lead compounds with desired properties. Let’s dive in a little deeper.

De Novo Drug Design 👩‍🎨

“De novo” simply means “from the beginning”, de novo drug design refers to designing chemical compounds from scratch without a template.

Traditionally there were 2 main approaches to de novo drug design: Structure-based and Ligand-based design. 🎨

  1. Structure-Based: In structure-based de novo design, the process is guided by the 3D structure of the biological target, often a protein, and its binding site. The 3D structure of the biological target is first determined, using techniques like X-ray crystallography or cryo-electron microscopy. The binding site within the target is analyzed to understand its shape, properties, and the key interactions required for binding. Computational docking algorithms are then used to virtually screen and position small molecules within the binding site. Finally, based on the knowledge of the binding site, new molecules are designed to complement the binding site’s shape and form favourable interactions.
  2. Ligand-Based: In ligand-based de novo design, the process is guided by the properties and structures of known ligands (small molecules that bind to a specific biological target, often a protein) with the target of interest. This process begins with a set of known ligands that bind to the target. Fragments or molecular substructures that are common in the known ligands are then identified, and these fragments serve as building blocks for the new molecules. Fragments are finally assembled into new molecules, and computational tools help optimize the connections between fragments and the overall structure.
Traditional de novo drug design.

With recent breakthroughs in AI software and hardware, we can greatly improve the de novo drug discovery process. 💻

As discussed earlier, the chemical search space in drug discovery is incredibly vast, making the task of discovering new drugs highly challenging. The number of potential chemical compounds that could be synthesized is huge, and even though we are constantly looking for novel drug candidates we have just barely scratched the surface of the chemical space. Selecting, designing, and synthesizing new molecular structures suitable for entry into the drug discovery and development pipeline is a challenge, but this “large search space” is a problem well suited for AI.

AI specifically brings several benefits in exploring a greater chemical space, eliminating bias, and saving time and money. Researchers in academia and industry have been experimenting with various deep-learning approaches for lead generation. Let's take a look at two.

Deep Reinforcement Learning for De Novo Drug Design 🤖

For the past little bit, I’ve been building Reinforcement learning (RL) algorithms to play games. RL is a subfield of machine learning that deals with learning to make sequential decisions in an environment to maximize a cumulative reward. RL is quite good at playing games as you learn how to get the most points/wins/rewards through trial and error. Just as you can learn how to play a game of Sudoku, if you set up the game right, you can learn how to generate novel molecular structures (your moves in the game) with optimized efficacy, safety, and pharmacokinetic properties (your scoring metrics).

RL works by training algorithms to make decisions that maximize a predefined objective, such as finding molecules with desirable properties for drug development. These algorithms explore various molecular structures and properties, learn from the outcomes of simulations or experiments, and adapt their strategies over time to identify promising drug candidates more efficiently. This approach accelerates the drug discovery process by guiding researchers toward compounds with the highest likelihood of success while minimizing the need for extensive trial-and-error experimentation.

The key components for any RL system are (1) your agent which is the learner or decision-maker that interacts with (2) your environment. The agent makes a sequence of decisions known as (3) your actions to accomplish a task following some (4) policy to get a (5) reward.

This reinforcement learning approach of identifying lead targets has been proven successful and is now being utilized in industry. 🏥

Insilico Medicine recently came out with Generative Tensorial Reinforcement Learning (GENTRL). This is a deep generative model to design novel molecules with desired properties. GENTRL is specifically designed for de novo small-molecule design and can optimize the properties of generated molecules in a particular direction. This new AI system for drug discovery has shown promising results in dramatically accelerating the process of lead discovery from years to days.

The GENTRL approach to RL for Drug Design.

Graph Neural Networks (GNNs) for Drug Design 📈

As I wrote this article I originally saw some similarities between games and drug design. Modelling drug design like a game of Sudoku helped me understand the limitations of current drug design processes, but in writing this article, I learned that Sudoku and Drug Design have more similarities than I originally thought.

GNNs have various applications in the drug discovery process. 🧪

GNNs represent molecules as graphs, with atoms as nodes and chemical bonds as edges. GNNs are trained on labelled data, allowing them to learn from experimental results, and they can be fine-tuned for specific drug design tasks using transfer learning.

GNNs have also been used for de novo drug design. 🧩

There have been various approaches to GNNs applied to De Novo Drug Design tested out, but one particular research study conducted by researchers at The University of Toronto really stuck out to me. They took inspiration from Sudoku to build algorithms that proved successful in predicting protein shapes.

The team specifically represented proteins as graphs, where nodes in the graph correspond to amino acids, and edges represent the distances between them within the molecule. By applying principles from graph theory, the researchers were able to model the geometry of protein molecules for specific purposes, such as designing proteins to neutralize viruses or target overactive receptors in cancer.

Their model “ProteinSolver”, is a graph neural network designed to create novel protein molecules that are finely tuned for specific therapeutic purposes. Their main neural network algorithm was trained on solving Sudoku as a starting point. Similar to how you must find missing values in a grid impacted by the existing number values around you in Sudoku, individual amino acids in a protein molecule are affected by surrounding electrostatic forces. Basically, the “opposites attract” phenomenon of molecules, makes modelling drug design as a game of Sudoku quite useful.

The ProteinSolver architecture.

The applications of Artificial Intelligence in drug discovery represent a transformative leap in the field of pharmaceutical research. 👩‍🔬

AI-driven tools have already begun demonstrating their potential to accelerate the drug development process by facilitating the identification of promising drug candidates, optimizing molecule design, and improving our understanding of complex biological systems. As AI-driven drug discovery continues to evolve, we can anticipate breakthroughs in the development of innovative therapies and a significant impact on global healthcare by bringing new drugs to the market faster, addressing unmet medical needs, and potentially saving lives.

I’ll be exploring AI’s application in drug design for the next bit, both to enhance our understanding of molecular interactions and to expedite the drug discovery process. My next steps are building predictive models and delving into de novo drug design techniques, harnessing AI to develop novel compounds with optimized properties. Feel free to follow along as I level up my drug discovery game!

Hey! I’m Vinaya, a high school student passionate about using technology for social good. If you have any suggestions, questions, or just want to talk, you can message me on LinkedIn or Twitter. Feel free to subscribe to my bi-monthly newsletter to stay updated on my projects, and tech news. Thank you for reading and I hope you learnt something new!

--

--