The Privacy Myth of De-Identified Medical Data

Published in

Health Wizz

4 min readOct 2, 2017

You walk into the doctor’s office. After what seems like an eternity reading old and outdated magazines in the waiting room, they lead you to a private room and shut the door. The nurse enters, writes on a chart (or maybe an iPad) and shuts the door. A doctor enters and shuts the door.

It seems like there is privacy all around — privacy you expect.

But what if you were to find out those medical records containing your private history, family history and medication history weren’t so private after all?

In 1997, Latanya Sweeney showed how demographics appearing in medical data that did not have the names of patients can be linked to registries of people (e.g., voter lists) to restore name and contact information to the medical data. Combinations of few characteristics often combine in populations to uniquely or nearly uniquely identify some individuals. Clearly, data released containing such information about these individuals should not be considered anonymous. Yet, health and other person-specific data are publicly available in this form. Her earliest and well publicized example was identifying the medical information of William Weld, former governor of Massachusetts, using just his date of birth, gender, ZIP appearing in a voter list. Numerous experiments have been done since then. Members of the Data Privacy Lab linked demographics found in publicly available health and genomic profiles in the Personal Genome Project to voter lists and other public information to put names to the profiles, and the results were 84–97% accurate for those profiles for which names were predicted. It is fair to say that costs involved in deriving anyone’s identified medical records from their de-identified medical records is some CPU cycles and software combined with publicly available free consensus data.

Demand for De-Identified Medical Data

Identifiable data cannot be sold or used in marketing or research without patient permission, but de-identified data can. De-identified data is not subject to HIPAA because it’s not Protected Health Information (PHI). ACOs, health plans, and independent researchers use de-identified clinical and/or claims data in population health studies. Some hospitals use the information to help target expensive and new treatments via direct mail, and are also buying the data to try and gain a better picture of local residents. Research organizations use clinical information combined with randomized clinical trials to conduct medical research. Drug companies use de-identified pharmacy data to target their marketing to individual physicians. They have spent years of research and development looking for a particular product to treat a certain patient condition. They would have an interest in knowing who some of these patients are so they can customize their marketing efforts and detailing, specific to that patient demographic to help sell their medication. While some insurance companies might not seem likely purchasers of medical data because people assume they have the information, many health insurance companies try to purchase an individual’s past health information to determine the premium to charge and whether to even provide coverage. None of these uses require patient permission.

Less well known is that electronic health record (EHR) vendors, which are increasingly getting into population health management, also use de-identified patient data. When a physician signs a license agreement with a vendor, there’s almost always a clause that gives the vendor the right to use that data. In fact, the vendor is ultimately the one who is de-identifying and aggregating the data. In some cases, the vendor might have a package that gives their customer access to other providers’ de-identified data in the form of quality benchmarks. In return, the practice must agree to provide its data. Included in the software license is a clause about data or intellectual property ownership. The clause typically has the physician giving up rights to de-identified, aggregated data and gives the EHR vendor the right to commercialize it.

Medical data has become a multibillion dollar worldwide industry between healthcare providers, drug companies, and a complex web of middlemen. Prescriptions, hospital records, insurance claims, blood test results and more are all part of an exploding medical data bazaar. It benefits almost every actor in the healthcare ecosystem except the patient to whom this data belongs.

Alternate Medical Data Market

What we need are regulations that prevent anyone from selling or sharing our medical data without our consent; regardless of whether this data is de-identified or not. Since it is so easy to derive identified medical records from de-identified information, there is no reason why de-identified data should not be considered as PHI and subject to HIPAA. Only then, there will be a power shift from healthcare system to consumers. It will force pharmaceutical companies, research organizations and hospitals to go directly to patients and request their data. Consumers will decide when and if they want to share their medical records with whom and for how long. An alternate market will emerge where forces of supply and demand will govern the price of medical data. Patients will start treating their medical records as digital assets, the value of which is determined by market forces. These digital assets can be traded directly between suppliers who happen to be patients, and consumers who happen to be pharmaceutical companies and research organizations. This alternate market will be more efficient than the existing medical data bazaar because the value chain has shortened, eliminating several middlemen.

The Privacy Myth of De-Identified Medical Data

Written by Raj Sharma