Photo by Sangharsh Lohakare on Unsplash

We aren’t collecting the data that precision cancer care really needs!

Ranjani Ramamurthy
llmed.ai
Published in
6 min readNov 16, 2022

--

I started working at the intersection of health data and genomics before genomics driven cancer treatment became as well known as it is today. At Dr. Soheil Meshinchi’s lab at Fred Hutch Cancer Center in 2012, we had (for that era) a fair volume of AML “omics” data. Whole genome sequencing for many hundreds of patients, miRNA sequencing data for another few hundred, gene expression profiles as well as longitudinal lab and outcomes data. While our lab was ahead of the curve in gathering deep biological data, we did not really have all the data we needed for every patient. These holes in the available data made it difficult to perform an integrated analysis over all patients.

Fast forward a decade. We’ve moved in the right direction toward the use of genomics for treating cancer, but there has not been as much progress as we’d have hoped. Technological advances have made processing large datasets a lot easier and less expensive, for sure. However, the problem remains that we just do not collect the right kind of data at the right time in a cancer patient’s journey.

The primary obstacle to rapid innovation in precision cancer care is our ability to collect deep and complete patient datasets and share them.

That’s what this article is about: what data we collect today (and why), what data we need to collect (and why) — and the types of data that can make the promise of precision medicine a reality.

In brief:

  • Precision medicine (using genomics) requires deeper and more diverse data than what is currently collected as ‘standard of care’ (aka ‘real-world-data — RWD). Data collected as a part of the standard of care, like lab values, pathology, imaging, gene panels, etc. are meant primarily to help a clinician with the signals needed to guide a patient through their treatment protocol. This is insufficient for precision medicine.
  • In particular, we need to be collecting deep omics data prospectively, while the patients are undergoing treatment. This deep omics data should be linked with clinical real-world datasets to provide the data assets required for research. Today, research grants fund some prospective data collection along these lines. We need to develop a business model to fund this data collection at scale for all cancer patients.
  • There needs to be careful thought and consideration on the privacy, security and regulatory issues related to genomic data sharing. In addition to investments into technology and regulation, further investment in direct-to-patient efforts could help. This requires a deep dive in itself.

Standard chemotherapy vs. Targeted therapy

When a patient is first diagnosed with cancer, their tumor is analyzed with imaging (if relevant), labs, histopathology and gene panels, to understand the nature and the severity of the disease. With this data, the patient is stratified based on risk (on probable response to therapy) and a treatment protocol is chosen.

Note that risk stratification — i.e. knowing which patients will respond best to therapy (and those who won’t) is important. Clinicians would like to get as precise (and granular) as possible, in order to improve care for their patients.

The vast majority of cancers are treated with standard chemotherapy, where all rapidly dividing cells (that include cancer cells) are killed. The treatment can be harsh and debilitating. As one oncologist friend put it, “we’re still mostly carpet bombing the body”.

With information from the histopathology and gene panels, the clinician may sometimes have the option to prescribe a drug that is able to target the exact enzyme or protein that carries a genetic alteration that drives or feeds the cancer. These are called targeted therapies (aka ‘molecular targeted therapies’), which are a form of precision cancer treatment.

Immunotherapies and monoclonal antibodies are examples of molecular targeted therapies. They can be less toxic and have fewer side effects than traditional chemotherapy. However, some patients can relapse even after they initially respond well to targeted therapies.

Labs, imaging, etc. continue to get collected through the patient’s journey that helps the clinician with monitoring and guiding the response to care.

Deep omics data is not usually generated from patient tumor samples. Neither are patient tumor samples routinely bio banked. The reasoning is to do with a lack of a business model that supports this data collection at scale. The lack of these classes of data hampers some very important use cases.

To develop targeted therapies for the majority of cancer patients, and to help clinicians make better decisions and continuously improve outcomes for all patients, we need to be learning from each patient’s cancer journey.

How are targeted therapies created?

Let’s start with defining a simplified therapeutic pipeline. There are 4 stages.

1. Target Discovery: Early-stage scientific research. A mix of bench and computation efforts (using deep omics data), to determine targets that are significant for a specific cancer subtype.

2. Pre-clinical testing: Validation of targets discovered in step 1, with a variety of in-vitro and in-vivo approaches.

3. Clinical validation + FDA Review: Clinical trials. Data review by the FDA, for approval and for addition to therapeutic protocols.

4. Post-market surveillance: Track use in the real world, when the patient is on a treatment protocol with the drug.

Target discovery is the first step of drug discovery. It is complex and is crucial to optimize, because steps 2 and 3 (drug development) are extraordinarily expensive. Our ability to optimize target discovery is governed by the depth and breadth and consistency of the patient data available.

Here are the data types needed at this stage:

  1. EHR data: This includes labs, routine imaging and pathology, medications, and demographic and SDOH data (race/ethnicity, etc.).
  2. Gene panels: These are routinely collected for some cancers at diagnosis. The test identifies known genomic anomalies that may be causative/prognostic indicators or may have targeted therapies approved by the FDA.
  3. Deep Omics: Whole genome sequencing, Exome sequencing, Proteomics, Transcriptomics conducted on biological samples collected at diagnosis, and later in the treatment journey (when the patient is found to be refractory to treatment, or has relapsed).
  4. Biobanking: Saving biological samples both at diagnosis and when patients are refractory or relapsed.

Cancer researchers agree that genetic anomalies (one or many) can be causal to a disease state. Over the last decade, GWAS studies have discovered a growing number of associations between specific genetic anomalies and various cancers. GWAS alone does not help with identifying targets but can point a researcher toward the subgroups of patients whose omics information may provide clues to the development and progression of the cancer.

To actually discover novel targets, researchers need deep omics data and outcomes information collected from those patients who are diagnosed with and subsequently treated for the cancer. For some cancers, with access to the right data, this exercise can be direct. In most cases, however, target discovery is complex and computationally intense. There could be multiple mutational drivers of the cancer. Understanding the tumor cell’s previous molecular state might be relevant. And, given the heterogeneity of cancer (pediatric AML for example is a rare cancer but very heterogenous with a varied number of subgroups, 20+ by some counts), target discovery becomes really challenging unless you have large and deep omics datasets.

Finally, we expect the science and technology to evolve rapidly. Proactive biobanking (when applicable) enables researchers to extract further information from the bio banked samples down the road.

In conclusion

In addition to target discovery, comprehensive data collection leads to more accurate and granular risk stratification, which in turn helps the clinician choose optimal treatment protocols. There is a virtuous cycle here.

It is interesting that these deep omics datasets also help with drug repurposing, a particularly pragmatic and useful use case where drugs that are in various stages of the therapeutic pipeline for use for specific diseases could be re-purposed for other diseases.

In a nutshell:

My lens into this world is through pediatric blood cancer research. My last article spoke to how investment in pediatric blood cancer therapeutics (by big pharma) has been low & that children are mostly subject to “trickle down” therapeutics.

In my next article, I’ll discuss how collecting and analyzing deep novel datasets are helping with drug repurposing in pediatric leukemia, and offering patients some options that were otherwise unavailable to them.

Thanks for reading!

--

--

Ranjani Ramamurthy
llmed.ai

Product Management, MD, Cancer Research, Engineer, Health-Tech advisor, GH Labs, ICGA, Fred-Hutch, LLS, ex-Microsoft, pediatric cancer research advocate.