Missing COVID-19 Diagnoses in EHR and Claims Data

Rajesh Dash MD PhD
{Data, Value} driven Medicine
5 min readApr 12, 2021

Prior Studies Using EHR or Claims Data

By some estimates, over 200,000 COVID-19 research articles were published in 2020 [1]. Many of these have relied on retrospective EHR data (collected from inpatient or outpatient health records) or claims data (which is created in a standardized format to facilitate insurance reimbursements). While both data sources have their merits, EHR-based studies can suffer from small sample sizes [2–5], while studies using claims data suffer from incomplete clinical information, and often must incorporate additional data sources
[6–8].

HealthPals-Veradigm Collaboration on FDA Investigation Into Natural History of Coagulopathy in COVID-19 Patients

Working with the FDA as part of the Reagan-Udall Foundation Evidence Accelerator Program, HealthPals in collaboration with Veradigm analyzed a unique dataset of 35 million US patients with linked claims and EHR data in an effort to better understand the natural history of coagulopathy in COVID-19 patients. Patients were identified as having COVID-19 if their medical record included ICD-10-CM codes indicating a COVID-19 diagnosis or if there was a record of a positive COVID-19 PCR test. Using these criteria, over 600,000 patients with evidence of COVID-19 were found in this linked dataset.

As part of the investigation we also looked at the respective capture of COVID-19 patients for each data source. The results were surprising.

Intersection Between EHR and Claims

We expected that, among the over 600,000 patients with diagnoses or PCR tests confirming COVID-19 in either EHR or claims, we would find that, for the majority of patients, confirmation would be found in both data sources. That was not the case.

Figure 1. Number of patients for whom confirmation of COVID-19 was found in EHR, claims or both.

The figure above shows the numbers of patients whose COVID-19 diagnosis were confirmed in EHR and claims data sets and the intersection of patients whose diagnosis was confirmed in both. We found that 21% of all COVID-19 patients had diagnosis confirmation in EHR data alone while 71% of patients had confirmation only in insurance claims data. Surprisingly, only 7.3% of patients had evidence confirming COVID-19 diagnosis in both claims data and EHR data.

Implications

The implications on retrospective COVID-19 studies are profound. If a study uses only EHR data, the study would be missing 71% of COVID-19 diagnoses that appear only in claims. And if a study relies only on claims, it would miss 21% of COVID-19 diagnoses that appear only in the EHR.

In COVID-19 analyses, having a dataset where only a fraction of patients with COVID-19 are labeled as such, while the rest are labeled as healthy, can negatively impact statistical analyses and predictive modeling efforts. This is especially true for comparisons which require a control, non-COVID-19 population, where it is essential that none of the control patients have COVID-19. Efforts to compare event rates, identify risk factors, and develop predictive models on COVID-19 and non-COVID-19 populations with improperly labeled patients may lead to biased conclusions and poor model performance.

Moreover, this is not a problem that we believe is unique to COVID-19. The breakdown of EHR capture and claims capture will be different for each disease state and requires further analysis.

Conclusion

It is well known that EHR and claims data have their own respective strengths and weaknesses; these different sources provide more accurate information on some concepts and less accurate information on others. Selection and processing of these disparate data sources requires a deep understanding of not only the research questions being asked, but also the relative merits and shortcomings of each dataset.

Based on these surprising findings, we believe that linked EHR and claims data provides a more accurate and complete picture of a patient’s medical state than either source alone. Further, we believe that in order to conduct retrospective COVID-19 studies that authentically reflect real-world population health, it is imperative to use linked EHR and claims datasets.

HealthPals is a Silicon Valley-based company that developed CLINT™, a precision insights platform capable of understanding clinical RWD from the patient to the medical system. The platform was built with scale in mind, has been run on over one billion patient-life-years of data, and has been used by major life science companies and medical payers to profile user-defined patient cohorts and model clinical outcomes and disease progression. The CLINT™ Cohort Optimization and Synthetic Control Arm offerings support next-generation clinical trial design and can work in conjunction to enable significant time and cost savings for its partners.

If you are interested in learning more about HealthPals and our CLINT™ platform, contact us at hello@healthpalsinc.com

Veradigm is an integrated data platform and services business unit of Allscripts that combines data-driven clinical insights with actionable tools to help healthcare stakeholders improve the quality, efficiency, and value of healthcare delivery. The company’s Life Sciences organization has unique data assets supporting life sciences researchers, including the largest source of de-identified ambulatory EHR data, as well as PINNACLE Cardiovascular and Diabetes Collaborative registries, operated in collaboration with the American College of Cardiology.

Sources

  1. Else H. How a torrent of COVID science changed research publishing — in seven charts. Nature. 2020. pp. 553–553. doi:10.1038/d41586–020–03564-y
  2. Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;395: 1054–1062.
  3. Guaraldi G, Meschiari M, Cozzi-Lepri A, Milic J, Tonelli R, Menozzi M, et al. Tocilizumab in patients with severe COVID-19: a retrospective cohort study. Lancet Rheumatol. 2020;2: e474–e484.
  4. Wang Y, Jiang W, He Q, Wang C, Wang B, Zhou P, et al. A retrospective cohort study of methylprednisolone therapy in severe patients with COVID-19 pneumonia. Signal transduction and targeted therapy. 2020. p. 57.
  5. van Gerwen M, Alsen M, Little C, Barlow J, Genden E, Naymagon L, et al. Risk factors and outcomes of COVID-19 in New York City; a retrospective cohort study. J Med Virol. 2021;93: 907–915.
  6. Trifirò G, Massari M, Da Cas R, Menniti Ippolito F, Sultana J, Crisafulli S, et al. Renin-Angiotensin-Aldosterone System Inhibitors and Risk of Death in Patients Hospitalised with COVID-19: A Retrospective Italian Cohort Study of 43,000 Patients. Drug Saf. 2020;43: 1297–1308.
  7. Seiffert M, Brunner FJ, Remmel M, Thomalla G, Marschall U, L’Hoest H, et al. Temporal trends in the presentation of cardiovascular and cerebrovascular emergencies during the COVID-19 pandemic in Germany: an analysis of health insurance claims. Clin Res Cardiol. 2020;109: 1540–1548.
  8. Maneck M, Günster C, Meyer H-J, Heidecke C-D, Rolle U. Influence of COVID-19 confinement measures on appendectomies in Germany-a claims data analysis of 9797 patients. Langenbecks Arch Surg. 2020. doi:10.1007/s00423–020–02041–4

--

--

Rajesh Dash MD PhD
{Data, Value} driven Medicine

I’m a Stanford Cardiologist and Assoc Prof. I also co-founded HealthPals, a Precision Population Health company that reduces chronic disease burden globally.