A New Tool to Find New Drugs

9 min readMar 27, 2018

What follows is a narrative adaptation of the talk on simulating biology that I presented at Nvidia GTC 2018, which is also online if you’d prefer to watch:

At NeuroInitiative we are driven by the idea that new inventions are the way to have the biggest impact on civilization. Look back at human history, and it’s the inventions that stand out — controlled fire, cooking, wheels, engines, flight, microscopes, computers. If there’s even a chance you can create something to advance life, why would you do anything else?

Figure 1. Inventions which have changed the course of humanity. From left to right, an early microscope, the Wright brothers’ first flight, and the first transistor.

Tools can enable discovery of medicines which extend and improve quality of life. If you look back just 150 years, the understanding of germ theory and discovery of antibiotics helped to double average life expectancy from 40 to 80 years, a trend that has slowed, and by some measures reversed.

Problems in Drug Discovery

Unfortunately, the process of drug invention is fraught with peril. It currently takes over 20 years and $2 billion to get a single drug to market, with 90% of drugs entering clinical trial failing to reach approval. Most of these failures are due to lack of efficacy — drugs safely do something but fail to impact disease.

Figure 3. Cause of clinical trial failures (Source: https://www.nature.com/articles/nrd.2016.184)

Read on to learn why I think things are different now, the tool NeuroInitiative has invented to address the efficacy problem, and an example of how we used this tool to invent a new drug for Parkinson’s disease.

Why Neuro?

The complexity of neuronal cells and disease process present an incredible challenge — one which I believe is especially well suited to computational assistance. The puzzle pieces are there but there are too many moving parts to manually make sense of. Also, these diseases are a huge unmet need and getting worse, contrary to some of the more “popular” research areas.

Recipe for a Successful New Medicine.

For a new drug to be successful, it needs to meet a handful of factors. You need some molecule that effectively engages a target, where that target engagement safely alters disease progression or symptom. It needs to solve a problem for which there are patients, and you have to be able to identify for which patients to prescribe the medicine. As mentioned above, about half of drugs fail for lack of efficacy, which means the drug did something, but that target didn’t impact disease, or it wasn’t tested on the right patients. The software platform I describe below directly aims to improve visibility into validating why targets should work, and for whom they should work.

Figure 5. Recipe for successful new drug. Target and Patient selection primary cause of efficacy failure.

NeuroInitiative’s Simulation Platform

These are solvable problems with the right data, tools, and talent. Luckily the world’s scientists have been churning out data at an incredible rate with 29M manuscripts on PubMed growing at over 1 million per year. Since the human genome project, we know the parts, and largely what they look like, which I’ve previously described if you are interested in diving into some of the data. Now I’ll dive into a little more detail on our patented simulation platform and how we bring that data together for in silico cells. Because we wanted this to be used by a broad base of biologists, rather than just computational scientists we put some effort up front into developing a user interface for building models, running & visualizing simulations, and analyzing results — many projects can be completed out of the box with no new code. To not block the power users, we built a REST api to expose data and logic, which plug in nicely for R or Python scripting. The heart of the system is a custom C++ simulation engine built on Nvidia CUDA toolkit and hosted on Microsoft Azure. Virtualized GPUs on Azure have been great, allowing us to scale in minutes to levels which just a few years ago would have required access to a handful of supercomputers around the world.

Figure 6. High-level system architecture of NeuroInitiative’s SEED simulation platform.

When I say “simulation” I mean 3-dimensional spatio-temporal movement and behavior of biochemical entities, as shown in the below screenshot. Users can see context of sub-cellular locations in relation to whole cell and zoom in to see localization and complex structures inside the cell, including organelles and intra-organelle compartments. While there are tools and methods for simulating reaction systems based on ordinary differential equations, I believe those provide limited predictive value by neglecting the impact of localization, along with the complexity of modeling nested membranous structures which are continuously moving and merging through fission and fusion events.

Figure 7. Screenshot of virtual cell visualization from SEED user interface

Looking more closely at the data driving the model, there are two main data categories: spatial, and behavioral. The spatial model, represented below is inspired by Systems Biology Markup Language, with extensions to support the physical detail needed for this type of simulation. While a few other groups have created particle based biological models, to our knowledge this is the first fully data-driven extensible implementation.

Figure 8. Mapping cellular spatial layout from data model to visualization.

For behavioral detail we use a library of entity-entity interaction rules as shown below. At the top of figure 9 below is a screen shot from IntAct, one of the top protein-protein interaction (PPI) databases with about 1.5 million interactions, followed by one of the simpler interactions in our database illustrating the complexity missing from PPI databases. It’s not enough to know that RAB3A and LRRK2 interact. Our version below also highlights that LRRK2 must be bound to an ATP molecule, and that the outcome of the reaction is a specific post-translational modification on RAB3a with an ADP byproduct. Consider further that both of these entities may be on or off a variety of membranes, bound to GTP or other entities, and the potential for any pair explodes to hundreds or thousands of variations.

Figure 9. Protein Protein Interaction as listed in IntAct at top, followed by one of the variations in SEED.

We now hold the most complete and accurate library of entity interaction rules related to Parkinson’s disease pathways in the world with 65,000 curated interactions between 17,000 states of 1350 entities, which are part of a larger library of over 1 million interactions imported from other sources and queued for curation.

The core of the simulation engine is a time-stepped particle system similar to that described in GPU Gems from Nvidia. In addition to the physical position/velocity calculations, our system evaluates biological rules as described above, logs quantitative data, and streams view-port particle positions to remote clients for near-real-time visualization. The data structures and algorithms to do that without killing iteration rate were non-trivial and a challenge I leave to the reader :)

The math driving the first two steps is a modified implementation of Newtonian physics using Verlet integration and perfectly elastic collisions.

Figure 11. Math used to simulate Newtonian physics.

As with any new method, we needed to establish a metric for accuracy, for which we compare simulation results to biological lab results downstream of similar manipulation. Where this has been previously published, we can simply plot and calculate a correlation coefficient as shown below. Where we have new findings in silico, we must go back to the lab for new measurements. Through a project funded by the Michael J Fox foundation, and in collaboration with the Mayo clinic, we’re doing just that.

Figure 12. Comparison of simulation results to biological studies.

Now onto a use case. To understand what is happening in Parkinson’s and where we might intervene, we hypothesized that the overlap between genetic and sporadic disease may hold really interesting clues. While there are several genetic mutations which can cause the disease, these only account for about 10% of cases. Somehow these starting points all end at the same hallmark pathology of protein aggregation and progression cell loss of a specific dopaminergic cell population.

Figure 13. Overlap between genetic and sporadic forms of disease provide interesting hypotheses.

For Manipulation 1 we started with a healthy, homeostatic, normal nigrastriatal dopamine neuron based on transcriptomic data from the Allen Human Brain Atlas, then replace only the LRRK2 protein with a G2019S mutant variation, creating a virtual Knock-in model (KI). Comparing KI to control we start with an unbiased look at the full simulation, with the heatmap showing biochemical entities on the Y axis, Time on the X axis, and color indicating log2 fold-change of KI compared to control. Next, we performed functional enrichment against Gene Ontology to find biological processes significantly implicated by the changed genes.

Figure 14. Results of G2019S knock-in simulation.

Figure 15. Source of human patient transcriptomic data.

For manipulation 2 we replaced starting expression levels for all genes to data from a transcriptional meta-analysis of sporadic Parkinson’s brains, then performed a similar post-simulation analysis. It’s interesting to notice that the changes in this are much more diverse, which is to be expected as there is a lot more going on in a full biological context. Comparing these two sets, I’ve highlighted several cellular processes which changes similar in both models, narrowing the areas for further investigation and starting to suggest which processes may be disrupted in disease.

Figure 16. Results of Sporadic patient simulation.

Let’s get into an example of uncovering the molecular mechanisms behind these biological processes. In the exocytosis pathway we see that the protein RAB3A is disrupted in both manipulations, with a reduction in the active form. As shown in the schematic on the right of figure 17 below, RAB3 is required to facilitate docking of synaptic vesicles in preparation to release neurotransmitter during action potential. Interestingly we had previously seen reduced extra-cellular dopamine in transgenic G2019S laboratory models, and this new in silico mechanistic detail helps to form a hypothesis that can explain the phenotype.

Figure 17. Levels of active RAB3A over time on left. Schematic of synaptic vesicle cycle on right.

For our 3rd and final manipulation, we tested a therapeutic target which was disrupted in the genetic model to see if we could bring the cell back to health. Below we again look at RAB3A and show that its inactive form is brought back to normal level after addition of a virtual drug. This is just one of many abnormalities observed and corrected.

Figure 18. Change in ratio of inactive Rab3A after treatment in simulation.

Because of the automated nature of the model, once we have a disease model, additional iterative testing of therapeutic manipulations is very fast, allowing in silico target screening and validation. For solid hits here, we go back to lab data for verification, and employ a variety of rational design techniques to identify and/or create molecules which modulate these targets. Using this approach, we now have a pipeline of promising new drugs in three key areas, Parkinson’s, Alzheimer’s and general systems of aging and longevity.

Update:

Since publishing this story, NeuroInitiative has spawned Vincere Biosciences to develop small molecule therapeutics for Parkinson’s disease. We are hopeful that this unique blend of technology and science will accelerate new options into the clinic for the 10 million Parkinson’s patients worldwide.