I was about twelve when I found out my grandmother had breast cancer. My parents did a good job of shielding me from the worst of the details, but there is no way to avoid the fear that comes from a loved one being diagnosed with cancer. As a kid, there wasn’t much I could do, but my grandmother loves to tell the story of me trying to comfort her by telling her I was going to do research to help cure her cancer. Little did I know at the time that treating cancer is not as simple as taking a pill once a day and that even identifying the right medicine is akin to finding a needle in a haystack.
Over the next seventeen years, as I pursued undergraduate and graduate studies in biology and genetics, I filled in those knowledge gaps, but felt no closer to changing the status quo of breast cancer: while the overall 5-year survival rate is good compared to other cancers (89%), select subtypes such as triple-negative or inflammatory breast cancer have drastically lower odds of survival, and treatment programs can be long, complex, and draining to both patients and families. The pace just felt too slow until Dr. Nigam Shah of the Stanford School of Medicine approached twoXAR to collaborate on a project to do what I told my grandmother I would do: find a new way to beat breast cancer.
Collaboration for a Cancer Combination
At twoXAR, we are proud to have collaborated with the Shah Lab to publish our results in the Journal of American Medical Informatics Association — Synergistic drug combinations from electronic health records and gene expression. Dr. Shah’s lab had generated several predictions of drug combinations to treat breast cancer using a clinical informatics approach, but the Shah Lab, much like twoXAR, felt that more corroborating data was necessary to gain confidence that these predictions were likely to positively impact breast cancer patients’ health. That’s why twoXAR independently validated the Shah Lab’s predictions using orthogonal data and techniques; specifically, gene expression and systems biology data and methods. Using these complementary approaches, we found support for two combinations of drug classes for breast cancer treatment, then carefully investigated these molecular mechanisms across a breadth of literature sources to generate additional evidence for the efficacy of both the Shah Lab- and the twoXAR-predicted combinations.
While I’m of course delighted to see twoXAR’s methods validated and our results published by a leading journal in this field, I am much more excited that in a relatively brief period we uncovered combinations that “can be computationally prioritized to help direct preclinical research and, if promising, undergo clinical trial validation, repurposing, and optimizing of existing drugs for maximum therapeutic benefit.”
So how did we do it?
twoXAR is an artificial intelligence-driven drug discovery company. We leverage our computational platform to identify promising drug candidates, validate and de-risk them through preclinical studies, and progress candidates to the clinic through partnerships. One of our tenets for confidence in prediction is diversity: leveraging a breadth of real-world data spanning biological, chemical, and clinical sources to increase confidence in our predictions.
Let’s think about it this way: if you asked just one random person where the best place for sushi was, how much confidence would you have in their suggestion? Probably not much: perhaps they’re biased, or they’ve never had good sushi, or they didn’t hear you say “sushi”, but “smoothie”. Now what if, instead of relying on just one opinion, you could ask thirty different people? You now have input from a range of diverse sources — people with different backgrounds, tastes in food, etc. — and so any place that is consistently recommended has a much higher chance of being a good place to get sushi.
At twoXAR we aren’t searching for sushi, but instead we’re looking for the best treatments for a disease. Our “recommenders” are our diverse data and methods and the results are high-confidence matches between drugs and disease. This then gives us a curated set of recommendations to research and validate through other means.
Finding the Best “Sushi” in Breast Cancer
With any statistical predictive modeling method, false positives are a risk (if it were perfect, it wouldn’t be a model!). These false positives come from assumptions in the model, biases in the data, or some combination thereof. However, when you intersect several models with their own distinct biases and assumptions, you drastically improve your starting point.
That’s why it was such a great match for us to team with the Shah Lab at the Stanford School of Medicine, who are experts in extracting meaning from Electronic Health Records. By collaborating with them, we significantly expanded the number of ways in which we could get “recommendations” for combinations in breast cancer. Using vastly different underlying methods and sources (remember: to increase confidence), we independently generated predictions of which combinations of drug classes could be efficacious. The Shah lab used clinical records from one set of breast cancer patients to identify three significant drug class pairs that showed synergistic efficacy in treating breast cancer. Completely independently, we used biological data — gene expression — from a distinct set of patients and combined it with protein-protein and drug-protein interaction data to identify synergistic combinations of all possible combinations of drugs from one of our databases: an output of over ten million possibilities. We then evaluated where the three combinations from the Shah Lab fell in our rankings.
Of the Shah Lab’s three nominated drug class pairs:
- One had a very strong signal — highly significant enrichment — indicating that we too had predicted that those pairs of drug classes would be among the most efficacious combinations
- One, while not significantly enriched, had several drug pairs that we predicted would be helpful in treating breast cancer
- One of these class pairs had no signal at all
Bridging Big Data and Biology
My comfort in these results was tempered by a level of skepticism. I come from a molecular biology background, so to be truly confident in our predictions, I wanted to understand how and why these combinations would work.
Part of twoXAR’s standard process is to manually review the output of our platform. While our predictions are quite good, we cannot rely on algorithmic approaches alone when a patient’s health is concerned. Thus, our platform not only reports metrics on how well the model did but also generates a trail of evidence supporting the prediction. twoXAR’s translational scientists then further support this data by investigating top-ranked drug candidates (pairs in this study), and the proteins they interacted with, to understand the mechanism underlying these candidates’ efficacies (in this case how two drugs could synergize together to treat breast cancer). So, not only did we have candidate drug pair treatments, we also had a ranked list of genes and proteins that explain why these pairs could be efficacious in breast cancer.
Excitingly, for both drug-class pairs that twoXAR and the Shah lab identified, the proteins at the top of the list were already associated with breast cancer or were part of pathways that had been suggested as beneficial for breast cancer. And in addition, we found several examples of drug pairs we had identified already being tried in clinical trials! This was enough to convince me (and my fellow skeptics at twoXAR) that we had strong computational AND literature-backed evidence for two drastically different sets of computational predictions.
Improving Confidence in Cancer Treatments
This publication represents exactly what twoXAR is about: using computation and overlapping evidence to rapidly identify high confidence drug candidates that can treat patients quickly and effectively. On a personal note, this is extremely satisfying. It took me the better part of two decades to get to a position where I could actually follow through on what I said to my grandmother, but once twoXAR started working on the problem we had actionable results in a month. That is the sort of result that 12-yr old Aaron was hoping for!
This study, Synergistic drug combinations from electronic health records and gene expression, was published December 26, 2016 in the Journal of American Medical Informatics Association and can be found at: https://academic.oup.com/jamia/article/24/3/565/2664593/Synergistic-drug-combinations-from-electronic