The path to the next computational transformation of drug discovery

9 min readSep 25, 2022

Computation is making a comeback to drug discovery.

The idea that computation would transform drug discovery first excited the imagination of drug hunters decades ago. “Designing drugs with computers” was heralded as part of the next industrial revolution by Fortune magazine in 1981 (Figure 1). The excitement fizzled out with the burst of the biotech bubble in 2000, but the true promises of that round of digital revolution — namely computer visualization, cheminformatics (such as searchable compound database and data management), and molecular docking (in the sense of generating hypothetical binding poses of protein-ligand binding) — have been fulfilled and are almost universally adopted by all drug discovery teams today. When a big enough bubble bursts, there is always some real substance left behind. The remnants of the previous digital revolution are sophisticated software products made by specialized computational chemistry companies.

Figure 1. The early days of computation in drug discovery, 1981. A. Discover magazine ran a story about how computer visualization helped drug design. B. Fortune’s cover story on computer aided design and computer aided manufacturing–hailed, rightfully, the next industrial revolution–featured computer-aided drug design at Merck.

The renewed excitement about computation in drug discovery today is inspired by a much grander promise. If the previous round of technology aimed at computer-aided drug design (CADD), in the sense that computers help the medicinal chemists see on the computer screen what is in their minds, this time the theme is computation-driven drug discovery (CD³). Much of the enabling technological advance took place quietly in the years in between, but recently money and minds met to craft the future of computational drug discovery: The algorithms will design molecules, predict their potencies and properties, identify which targets and diseases to go after, and extract information from voluminous data for decision-making. In other words, computers will complement human expertise and experimental assays. They will do the thinking and drive decision-making. CADD increases the productivity of individuals on a drug discovery team; CD³ will augment the team with “thinking” machines.

Somewhere between the past computer-aided drug design and the future computation-driven drug discovery lie two intermediate paradigms: computation-enabled drug discovery (CEDD) and computation-first drug discovery (CFDD). In CEDD, computers play the role of generating novel hypotheses that enable a new target or a new class of designs to be considered; for example, new protein structural models may be computationally generated that enable the design of new molecules to target new binding pockets on proteins of interest. In CFDD, every molecule is first evaluated computationally for predicted potency and properties before it is ever made and tested, but the choice, however, remains the domain of human judgment. In CEDD and CFDD, human expertise is indispensable in making the call based on imperfect predictions. As computational predictions become increasingly accurate, CEDD and CFDD progress more and more toward computation-driven drug discovery, in which human intervention occurs at ever larger intervals.

I will not be discussing specific computational technologies in drug discovery in this writing. The academic publications on this subject are legion, and good scientific reviews already cover various technologies. Instead, I ponder what business models will foster breakthrough computational technologies in drug discovery.

One indisputable example of breakthrough is AlphaFold. Structural biology lies at the foundation of modern structure-based drug design. Accurate protein structure prediction leads to novel therapeutic hypotheses and critical information for designing new active molecules; it opens up vast opportunities in untapped therapeutic targets. The breakthrough in protein structure prediction, however, did not occur in a drug discovery organization, which stands to benefit the most from it. It took place in DeepMind, a company previously known for its game-playing AI and on its quest to use AI to solve outstanding scientific problems.

Technology breakthrough often does not happen in the organizations that will benefit the most from it. The advancement of technology benefits, but is not, their core business.

Paradoxically what may hinder the next computational breakthrough is the recent glut of investment in the field. Too much money is a bane to innovation, because it dilutes the needed focus. Too much money begets too many startups, splitting the talent pool and making it difficult for any team to reach the critical size to deliver solutions to hard problems. The biggest harm, however, is that it fuels every startup’s ambition to become a clinical stage company with an internal portfolio and pipeline.

One perverse incentive in biotech is that clinical stage companies are valued much more than preclinical stage companies, which induces the dubious practice to use the excessive funding to build an oversized drug discovery organization around a window-dressing computational team. The discovery team brute-forces its way to a development candidate molecule for clinical trial but touts it as an achievement of a fledgling computational platform. The investment that should have been used to improve the technology is instead diverted to costly drug discovery projects, even though the founding team may have little experience in leading drug discovery organizations.

Imagine that an airplane maker in the early days of aviation started a transportation company and employed a fleet of automobiles, steamboats, and locomotives to move people around, but attributed the business success to a few prototype airplanes.

An airplane maker should focus on building airplanes. Boeing supplies airplanes to airlines, but it does not operate airlines.

It is important for a computation-driven drug discovery company to use its technology to move one program to the clinic, to hone and validate its technology. But when the company pivots into a clinical stage company with multiple assets, its valuation is almost entirely tied to the success of its clinical programs. The priority is to invest in these clinical trials. Improving the technology generates diminishing marginal return for the company’s new goal. It is time for the technology unit to go its own way.

There are three advantages in an independent, computation-focused organization: amplified value, aligned incentive, and data exposure.

Amplified Value. Drug discovery is a numbers game: most discovery projects do not result in a marketable drug. Technology may increase the odds of success, but its effect is only manifest in repeated uses. Best technology does not guarantee the success of a specific discovery effort, as biology can defeat any individual drug discovery project. There is a large variance in the return on investment in the technology if its use is limited to a few projects, but this variance is reduced and the value is amplified if the technology is broadly adopted by the industry. On the same target, the discovery projects powered by technology are more likely to succeed–and succeed faster–than the ones that are not.

Once a computational technology is validated in actual drug discovery projects, and the developers and the initial users have worked together to realize its value in discovery, it is time that the developers continue to refine the technology and take it to many other users as broadly as possible.

Aligned Incentive. Once the computational unit is fully separated from the drug discovery organization, it creates a strong incentive for the former to deliver computational tools that create true value for the latter, because the latter is no longer under the obligation to use the former’s technology. The survival of the computation-focused organization depends on making computation work for drug discovery, and survival is a paramount incentive.

Everyone worth their salt wants to see their work impact their business. The farther removed from the company’s bottom line, the less impact one perceives in one’s work. We are all nearsighted in our perception of self-worth. For someone developing technology in a drug discovery organization, the initial impact when the technology is first delivered to the project may be obvious, but the incremental impact of additional enhancement will be hard to discern when the company’s focus is on getting drugs to the clinic.

The more mature the technology becomes, the more detached from drug discovery the developers feel. A developer may need to babysit a nascent technology in its first application in projects, thus experiencing a direct involvement in drug discovery. Once a non-expert can comfortably use the technology, however, the developer moves away from the projects.

Inside an independent computation-focused company, the developers can feel the direct impact of their work on the company’s core business, and their incentive is aligned with the company’s core competence: develop and deliver computational solutions to drug discovery. Any incremental improvements get amplified by repeated, large-scale use, and are rewarded by larger contracts and higher revenues. In this computation-focused company, the incentive is aligned with its core business.

Data Exposure. One technical advantage in repeated use of technology in many projects is its exposure to large amounts of data. Many of the computational models benefit from large and diverse training data sets. There are apparent mutual benefits for the drug discovery partners to share–while retaining ownership of–relevant project data with the computation-focused company: modern computational models often allow on-the-fly refinement and calibration by incorporating new project data. This helps to increase both the predictive domain of the general model and the predictive accuracy of the local model for the project. The computational organization ends up with an improved model, and the discovery partner enjoys the consequent gain in the time and cost savings in the project.

The advantage does not end with potential access to diverse project data. It makes sense for a computation-focused company–in contrast to a computational unit in a drug discovery organization–to invest in curating and cleaning up public data for its model development, because the investment is justified in the expected benefits of the improved model ready for deployment to diverse discovery projects.

A computation-focused company must maintain a close tie to active drug discovery. This helps it work on the right problems and invent new technologies to address emerging scientific problems. A good organizational structure should encourage continuous improvements to existing technologies, innovation of new technologies, and direct participation in drug discovery.

One such hypothetical organization is illustrated in Figure 2. It consists of a Core Technology unit that refines and improves mature technologies, an Emerging Technology unit that tackles new scientific challenges, and a Discovery Application unit that brings technology to external drug discovery projects. Note that staff should move fluidly among the three units. Once an emerging technology is ready for field testing, some of its developers should move into the Discovery Application and be embedded as consultants to implement the technology in drug discovery projects (ideally as a collaboration with a drug discovery organization). Once the technology is proven and validated in the projects, the developers should move into Core Technology to productize the technology, after which the developers may again move into Emerging Technology to work on the next computational solution.

Figure 2. The organizational structure of a company focused on computational transformation of drug discovery.

This computation-focused company creates value in three ways:

Through paid collaborations with other biotech and pharmaceutical companies by deploying its technology in their discovery programs. This can take the form of Computation as a Service (CaaS) or analytics and consultancy service.
Moving up the value chain by developing and delivering candidate molecules of specified target product profile to other clinical stage drug discovery organizations.
By forming new biotech companies with co-founders who bring in new biology, targets, modalities, attracting private investment. The company will enable and accelerate these biotech ventures using its technology and take due ownership for in-kind contributions.

One temptation to resist is to establish internal pipelines. These will divert the resources from the core mission of delivering computational technology, resulting in a schizophrenic organization with two separate objectives: technology and clinical candidates. Instead, the company should participate in external drug discovery projects, through the three mechanisms above. This ensures that the company stays focused on computational technology and at the same time keeps it relevant for drug discovery.

So the recipe for transforming drug discovery with computation is, in two simple steps:

Form a lean drug discovery company with a cohesive interdisciplinary group of people who all identify with the mission of using computation to discover drugs, develop the technology in the context of one or two internal discovery projects, figure out how to make it work, and advance one project into the clinic.
Spin off the team developing the computational technology with the single mandate of making the technology work as effectively and as broadly as possible.

The original company can be structured as an umbrella company consisting of two legal entities, one owning the technology and one owning the clinical assets. The trajectory of the technology development is depicted in Figure 3.

Figure 3. The trajectory of technology development in computational drug discovery.

“History doesn’t repeat itself, but it often rhymes.” Like in the early 2000s, we are living through another biotech bubble, this time inflated (at least partly) by the promise of computation-driven drug discovery. CADD became reality after the burst of the previous bubble, CD³ will too after the inevitable burst of this one. It will be the computation-focused companies that will deliver the technologies for the upcoming digital transformation of drug discovery.

The path to the next computational transformation of drug discovery

Written by Huafeng Xu