In 2009, President Obama passed the $19 billion HITECH Act to digitize patient data. The appeal from a research standpoint was clear: Researchers would have unprecedented ability to study what types of treatments worked for what types of patients. That vision would have allowed faster insights into COVID-19 treatments, the efficacy of new cancer therapies and how care varies by socioeconomic background. But 11 years in, this data has remained largely unusable. Most of the important data — including physician notes, lab results and scan images — sits unstructured and unstandardized in Electronic Health Records, where doctors store patient data. This dynamic creates high barriers to conducting research because the data must be painstakingly organized, labeled and validated before it can be used. But this is starting to change. In recent years, companies have found new ways to use the data generated from healthcare visits to improve research. This progress has been enabled by different types of companies. Some curate their own datasets, others enable additional uses of existing data and others create infrastructure to simplify data generation. As this ecosystem continues to develop, it should help move healthcare toward an era of personalized treatments, enabling research on smaller, more representative populations to be conducted at affordable cost.
How Research is Traditionally Conducted
The dominant source of data for medicine has been Randomized Controlled Trials (RCTs). Though they are the gold standard for evaluating a new intervention, these trials have a few issues. First, they comprise a small subset of patients. Under 10% of cancer patients participate in trials, meaning anything that could be learned from the experience of the other >90% was previously lost. They also can cost >$100M. Given the need to make a compelling business case before embarking on a trial, this obviously limits the types of questions that get asked and answered, particularly for smaller sub-populations. These trials also have well documented selection problems; all too often, the patient populations being studied are not representative of the patients that will be treated in the real world. They often skew toward younger, healthier, white and more affluent patients — meaning learnings from them may not translate to the full set of patients that will be treated. And the learnings from these trials take a long time to make it into clinical practice because they take 5–10 years to complete. While RCTs will still form the core of any analysis of new treatments and approaches, an emerging trend toward using Real World Evidence can serve as a helpful complement (for a far better and more detailed analysis of their complementary roles see here).
The Promise of Real World Evidence
Real World Evidence (RWE) refers to using the actual experiences of patients in the real world as captured in electronic health records, insurance claims, lab reports etc. This data can be used to answer questions on more representative populations, faster and at a lower cost. The data is de-identified to protect patient privacy.
In the last few years, RWE has become front and center for pharmaceutical companies and regulators. 90% of pharmaceutical companies have Real World Evidence teams, with >50% saying it is a mission-critical capability for the next two years. Congress passed the 21st Century Cures Act which includes provisions requiring the Food & Drug Administration (the primary body regulating drug approvals) to incorporate RWE in decision-making. The FDA commissioner must testify quarterly to Congress on the progress of using RWE.
There are many different use cases for RWE — all with different requirements depending on the depth and breadth of data.
- Drug Discovery: Researchers can look at patient data to better understand how diseases work. They can do this by observing variances in response to treatments and characteristics of patients with conditions. To do this effectively, researchers need deep data about patients, including genetic data and longitudinal clinical data to identify common characteristics.
- Trials: As mentioned above, clinical trials today are slow. Speeding them up lowers their cost and the time to bring drugs to market. The leading cause of trial delay is slow recruitment. Recruiting issues are exacerbated by poor choices in trial design (i.e., who is/isn’t eligible to be included in the trial). Better data can help inform more realistic trial designs. Furthermore, in recent years, regulators have also indicated a willingness to use Real World Control Arms. Today, one half of patients in almost any trial must be randomized into the “control” group where they are given a standard of care treatment, not the new therapy. This means trials have to recruit 2x as many patients than if they just needed to recruit patients for whom they planned to give therapies to. Real World Control Arms simulate the control arm. This requires a large number of patients to ensure a sub-group can be assembled that matches the trial arm and quite deep data. All the detailed datapoints that are tracked in a clinical trial need to be tracked for these patients.
- Commercial Launch: When pharmaceutical companies launch new drugs, they need to figure out what doctors they should target the product toward and track their team’s effectiveness. Real World Evidence allows commercial teams to understand doctors’ patient populations, the relative uptake of their products vs. those of competitors and how doctors are making decisions about what products to use.
- Post-Marketing Evaluation: When drugs are approved, there is still a lot we don’t know about them. Are they fully safe? Will they also work for other conditions? As a result, it’s important to track safety issues with drugs as they are more widely rolled out. And it’s important to learn whether drugs do/don’t work as doctors try them for other off-label conditions (e.g., different types of cancers). In oncology, a rise of accelerated approvals for areas with high unmet need has increased demand for this data.
The most lucrative use cases are largely tied to regulatory approvals for drugs. These help patients by ensuring more can join the experimental arm of a trial and more can receive drugs that work sooner. They also enable new revenue opportunities for pharmaceutical companies (through label expansions) and/or bring a drug to market faster extending its the time of its monopoly status (a drug’s patent starts when trials begin and expires after 20 years regardless of how long the trials take). Not all datasets can meet these use cases. Doing so requires representative data and endpoints that may not be normally measured in routine clinical care. Companies often need to find ways to fill data gaps in their datasets.
Types of Companies in the Space
Investing interest in the Real World Evidence space has increased since Flatiron Health (a company focused on accelerating cancer research through insights from its EHR) sold for $2B to Roche. There are a few different interesting types of businesses in the space today, and innovation should only increase as the use cases above become a more standard part of the healthcare system.
- Proprietary Datasets: This is the Flatiron model, which represents the first generation of companies in the space. They acquire interesting datasets — in ways we’ll detail below — pull insights out of those datasets and sell those insights. Because large investments are required to process and standardize this data, individual practices, labs, etc. have struggled to make their own data usable for more advanced use cases without partnering with these companies.
- RWE Application Software: These are companies that help others better use RWE. Their primary role is not providing the underlying data itself, but rather expanding the questions different end users can answer with the data by making it easier to work with.
- RWE Infrastructure: RWE can be time-intensive to produce. For regulatory use cases, nurses or other clinically-trained abstractors are required to pull out structured endpoints from the unstructured data sources in an EMR. Some companies are building tools that help automate tasks across RWE companies, allowing them to more easily make their data usable.
Most companies today are Proprietary Dataset companies. But as data becomes less of a bottleneck in Real World Evidence as more companies make it available, I expect we’ll see more Application Software and Infrastructure companies going forward.
Companies have found many different ways to obtain datasets. Below is a non-exhaustive list of companies illustrating many different strategies:
- Purchase Data: The simplest approach. IMS Health (now IQVIA) built a very strong business around understanding drug sales through consolidating disparate pieces of prescription data. It can be difficult for businesses in this space to obtain exclusivity around these datasets. Claims data is the most commonly purchased, and companies can then augment these datasets with harder-to-get datasets (e.g., specialty pharmacy).
- Go Direct to Patients: Given how much de-identified patient data is shared throughout the healthcare ecosystem by intermediaries without patients knowing, this seems like an important way for the space to evolve going forward. It has traditionally been difficult for patients to get and centralize their own records, but this is becoming easier with technologies like Picnic Health. Obtaining data directly from patients has a huge advantage. Most other approaches to obtaining data have some sort of data gap (e.g., if you have one hospital’s data you may not see what happened to that patient at another hospital or with a prescription they filled at a pharmacy). These gaps are crucial to getting a full longitudinal view of a patients’ care and outcomes. But sometimes the organizations with the data to fill these gaps don’t want to share them. Patients are best positioned to fill these gaps given their clear right to their data.
- Provide Genetic Testing: Genomic data is increasingly valuable as researchers try to understand what makes certain treatments effective for different types of patients. In oncology, for the first time drug approvals are being made for a given mutation, not cancer type. Some companies provide or subsidize genomic testing and can build large datasets. When combined with longitudinal clinical datasets, these companies can draw interesting insights on the types of patients that respond well and new potential targets. These datasets’ robustness is dependent on the proportion of patients that are sequenced as part of clinical care.
- Academic Society Partnerships: Academic societies within specialties have traditionally aggregated data for various research purposes. Companies can partner with these academic societies to commercialize their data.
- Provider Analytics: Companies can build and sell helpful workflows and analytics tools on top of a specialty’s core Electronic Health Record (EHR) system and get rights to the underlying clinical data that sits in it. Examples include providing clinicians relevant information on therapies and trials (Syapse) or tools for population health management (Optum/Humedica).
- Own Electronic Health Records: Companies can build or acquire a specific EHR for a specialty. These EHRs are used by practices to record and track patients’ clinical information, schedule patients and provide inputs for financial billing. EHR datasets often have the most datapoints (as most other datasets are subsets of EHR data). EHRs provide an alternative revenue stream for these companies. They also help them build relationships with practices which are crucial for eventually doing prospective clinical trial work down the line (though out of scope for this piece, many new trial models envision more work being done out of the EHR).
As for specialty selection, oncology is clearly the most lucrative market where the most research is being undertaken. Thus, it has attracted the most investment activity.
After Flatiron’s exit, many have sought to apply the Flatiron model to other specialties (including Verana in ophthalmology/neurology, Holmusk in behavioral health, Picnic Health and AllStripes in rare disease to name a few). There are a few characteristics that make a specialty particularly attractive for a Real World Evidence business:
- High drug spend in the therapy area: This increases the value of label expansions, accelerating trials and improving commercial team efficacy.
- Strong pipeline of treatments: Ensures a continued supply of research questions and higher margins as the same datasets can be sold to multiple pharmaceutical companies operating in the space.
- Variance in response to treatments: Increases the value of studying sub-populations to better understand the underlying science driving outcomes and identify unmet needs.
- The extent of a patient journey seen by individual clinicians: Answering questions via RWE requires following the patient fully through their care journey. Different conditions have different clinicians who have this level of visibility. For example, a patient with lung cancer has a Medical Oncologist quarterbacking their care. A patient with prostate cancer will mostly see a urologist. And a patient with diabetes may see a number of different types of doctors. The more different types of doctors that are treating a patient in their own silos, the harder it is to see the full care journey for that patient.
RWE Application Software
As the RWE ecosystem continues to grow, an exciting next wave of companies has emerged to help expand who can effectively use RWE and how they use it.
The first generation of companies with proprietary datasets have generally had their datasets used by PHD biostatisticians. But companies like Komodo, Boston Health Economics, and Aetion are building business intelligence tools to make it easier for users throughout the healthcare ecosystem to combine different datasets and incorporate this data into their workflows.
Aetion particularly helps address the truism that you can “torture data to say whatever you want.” They have developed a methodology to sit as a neutral 3rd party between pharmaceutical companies and the payers who may want to use this data to reach an objective decision about a drug’s real-world efficacy and how it should be reimbursed.
Finally, there are organizations trying to increase the amount of data that’s out there by making it easier for organizations to link their data with other datasets. Datavant and HealthVerity have created data exchanges where providers, labs and others can link their data with other datasets.
Today it is expensive to hire a team of nurses to take unstructured data and turn it into structured datasets. RWE companies are largely building their own tools and processes to do this. As the space continues to grow, a horizontal software infrastructure should emerge with tools to organize data abstraction and make it more efficient. Carta Health and Roam Analytics (recently acquired by Parexel) are two companies building tools within this space. As AI/ML platforms enter this space having large training datasets with annotated unstructured documents will be a key differentiator.
A reduction in the cost to create RWE and difficulty to interpret it should transform healthcare research. We should be able to quickly answer more and more questions that previously took years to answer, or didn’t meet the cost-benefit analysis of an expensive clinical trial. This should improve the healthcare we all receive.
I’d love to hear your thoughts about the future of the RWE space. Reach out to me at email@example.com. Always open to chatting!