Announcing Privacy Hub and the Future of Privacy-Preserving Technologies
by Travis May
Today, we are excited to announce the launch of the Datavant Privacy Hub and the acquisition of Mirador Analytics, the leading provider of HIPAA expert determination services.
This is a major milestone for Datavant, and a huge step forward on our mission to connect the world’s health data to improve patient outcomes. Moreover, we believe it will unlock a step change in how technology is applied to both preserve data utility and protect patient privacy.
This post will walk you through the problems we solve for customers and patients, what Privacy Hub is, why we acquired Mirador, and how we see the landscape of privacy-preserving technology evolving over time.
The State of Patient Privacy Protection in Health Data Exchange
Across the country, thousands of organizations are trying to bring together disparate de-identified health datasets for analytics. Whether sponsoring an oncology registry, building a model to find rare disease patients, or trying to understand the quality and cost of care, one of the major challenges that organizations run into is ensuring that the health datasets they are bringing together are adequately de-identified to protect patient privacy.
De-identification that preserves data utility is an extremely complex process. For example — imagine you want to answer a relatively simple question:
“What is the hospitalization rate of patients aged 60–70 who received the Moderna vaccine 6 months ago, stratified by comorbidities?”
Answering this question requires bringing together data from several different sources (vaccination data, hospitalization data, comorbidity data). Doing so without including information that can be used to identify a patient requires statistical analysis in order to understand the uniqueness of different elements included in the dataset, and how they might be used in combination to re-identify a patient. Complexity increases further when you add elements like genetic markers that might be used to predict COVID-19 risk.
When considering the need to de-identify patient-level data, there is a tradeoff between data utility and privacy preservation. One way of visualizing this is an efficient frontier: as more data is removed, the data becomes less useful for analysis. Conversely, when data utility is increased, privacy risk goes up. The key thing to understand is that there is only a direct trade-off if we are actually on that efficient frontier.
In practice, I believe that it is possible to simultaneously improve patient privacy and data utility — in short, we are nowhere close to the efficient frontier. The way we get there is with advanced, expert-informed technologies that can make sure that differing needs for privacy protection and research — like faster, more seamless de-identification processes, more analytics-friendly data, and complete privacy preservation — are not competing with each other, but rather, are being addressed at the same time.
HIPAA and How De-Identification is Applied
Under HIPAA, most organizations today rely on what is called “expert determination” to de-identify a dataset. Under HIPAA, a dataset can be considered adequately de-identified if an expert assesses the data to determine that there is a “very small” risk of re-identification. The benefit of the approach is that it is flexible and can support much of the valuable analytical work being performed across the healthcare system today. The downside is that it can be slow, with severe bottlenecks. Furthermore, experts are often not equipped with the right technology to provide any required remediations, validations or ongoing monitoring. This results in delays and less analysis that could be used to improve patient outcomes, as well as inconsistent protection of patient privacy across the industry.
Privacy Hub and the Acquisition of Mirador Analytics
We believe there is a better way.
It is hard to imagine health data being exchanged and connected across organizations at scale if there is a six-month delay for expert determination every time two new datasets are brought together. But if you fast forward into the future, privacy protection and disclosure risk assessments should work like this:
- Re-identification risk is assessed in real time using widely adopted tests and standards as new datasets are created and joined, allowing data scientists to assess the necessary trade-offs that will support robust analysis while protecting patient privacy.
- The necessary identifiable elements are automatically removed or transformed to de-identify the datasets.
- For ongoing data feeds, automated monitoring is conducted on a continuous basis to ensure continued compliance with the original certification.
- Automated tests and standards and advanced technologies incorporate the learnings of all datasets that have previously been assessed and deemed as having “very low risk” of re-identification using the same methods.
- Privacy preserving technologies (such as homomorphic encryption and differential privacy) go well beyond redaction and hashing — you can read more about each of these in more detail here.
Privacy Hub (a feature of the Datavant Switchboard) will begin by streamlining and automating major components of the expert determination process, create more consistency in the standards applied to protect patient privacy, and shorten the time needed to obtain useful, connected data. Privacy Hub will be usable by any data source, data recipient, or independent expert.
Mirador Analytics, the leading HIPAA expert determination company, shares this belief. Their team of experts has consulted on hundreds of datasets for leading pharmaceutical, insurance and data analytics companies. Mirador is known for its rigorous approach to the protection of patient privacy, while maximizing data utility to allow for innovation, efficiency, and development in healthcare.
We have partnered closely for years and now, with the launch of Privacy Hub, have come together around the shared vision of expert-shaped privacy-preserving technologies for the industry.
What the future holds
One of the biggest challenges for the health industry is how to connect data across institutions safely to enable the use of data at scale. To accomplish this goal, organizations will need to utilize multiple privacy-preserving approaches: redaction and hashing today, but ultimately frontier technologies as well like differential privacy, multi-party computing, and homomorphic encryption. These different technologies being deployed as part of a holistic privacy toolkit, fit for the analysis and workflow at hand.
As these technologies continue to mature, they will offer new ways to protect patient privacy and enable the use of data across the healthcare system. Datavant’s Privacy Hub will work with best-of-breed partners to bring the best technology available to protect patient privacy for a specific use case, whether doing analytics on top of a large synthetic dataset to inform drug development or analyzing clinical data in a fully-encrypted form.
In this future, everyone wins: the healthcare system gets smarter, privacy protections are strengthened and, most importantly, patients have better outcomes.