Engineering Provider Knowledge at Castlight Health — Part 2

by Sree Iyer, Vinay Yadappanavar

Published in

apree health (Castlight) Engineering

5 min readMay 1, 2019

In the case of other types of objects, say people, material objects, things like that, has anyone given a set of necessary and sufficient conditions for identity across possible worlds?
— Saul Kripke, Naming and Necessity

In part 1 of this series, we introduced the substantial difficulties involved in assembling and maintaining an accurate, robust, multi-purpose provider directory. In this post, we outline the two projects that tackled these daunting challenges.

Kripke

In Castlight’s original Provider Directory (PD) implementation, the NPI (a single identification number issued by the federal government to health care providers) was chosen as the unique provider identifier for many of the payers. However, there are many pitfalls to this approach.

A single NPI is often shared across multiple providers, especially by facilities and medical groups. In certain circumstances, a facility or group can apply for more than one NPI. For example, a facility or group with multiple offices or a lab testing chain may choose to get a separate Type 2 NPI for each office or ancillary. In the case of a single NPI being shared across multiple facilities or groups, we discovered scenarios in which multiple providers and provider attributes were incorrectly merged into a single Castlight provider that was displayed in our application.

Example of multi-facility NPI data we receive from payers:

Even practitioners are impacted by this shared NPI issue. In some instances, multiple practitioners affiliated with a group were independently referenced by the group NPI.

Example of a shared NPI between practitioners and a group:

Since our team was not initially aware of these data issues, the original PD implementation picked one of the provider names and associated all three addresses to that provider. In addition to losing providers, early versions of our application sometimes showed specialties that were not applicable to the desired provider.

As we gained more understanding of real-world usage of NPI and the seemingly endless number of edge cases, we realized that NPI was not a reliable, unique provider identifier for our use case.

In addition to improving the quality of the Castlight PD, Kripke improved the quality with which we matched providers to claims. Our initial claims interpreters relied heavily on NPI to identify the provider and location. The NPI issues described above resulted in cases in which a claim from a practitioner was incorrectly matched to a facility, and vice versa.

Due to the NPI issue, the past care feature in our app sometimes associated the wrong provider with a claim. In addition to the impact on our users, it impacted internal teams using claims to analyze pricing or make personalized recommendations.

In the first phase of this large project, we focused on provider splitting. Castlight’s Data Management team worked with payers to get them to send us a truly unique and accurate provider identifier, thus allowing us to treat NPI as an ancillary identifier.

In the second phase, we developed a new matching algorithm to associate the correct provider and location to claims data. We then reprocessed hundreds of millions of historical claims.

Figure 1. Kripke Claims to Provider Matching

Matcherize

The Matcherize team is tasked with relating provider records across datasets within and across provider directories we receive from payers, augmenting them with supplementary data, bolstering the reliability of attributes of providers surfaced in the application, and constructing concepts for better aiding health care navigation for our users. These functionalities can be viewed as layers of engineered knowledge over multi-payer provider directories that form the core of our care guidance solution.

The initial charter of the Matcherize project was to relate payer directories to publicly available datasets, such as clinical quality measures published by the Centers for Medicare and Medicaid Services, and to datasets procured through third-party vendors, such as affiliation data regarding practitioner relationships with practices and hospitals. Subsequent functions developed and owned by the team include:

construction of canonical Castlight reference provider representations, their interrelationships, and their attributes
synthesizing of Castlight’s proprietary clinical quality scores for providers, which are used to enhance search and discovery as part of our personalized guidance of users to high-quality providers
identification and taxonomic classification of various types of provider entities of interest to our business
improving the confidence of key attributes for providers and practices, such as reliable practice location addresses and phone numbers

Such work is commonly referred to as “Entity Resolution”, which is the art and science of cross-referencing datasets, matching records, deduplication, disambiguation, denoising, and canonicalization starting from disparate, messy datasets from different sources and of varied and often non-standard formats. Entity resolution problems are typically easily spotted and easily solvable by humans, even by non-experts. However, it is notoriously difficult to build software to solve such problems accurately and at scale.

Consider for instance the intuition that Dr. Jane Doe is unlikely to practice at multiple locations that are more than a certain reasonable commute distance apart, and most records will reference the person within that commute range. Or that Dr. John Smith, Emergency Medicine specialist at 999 Rte 73 North, 2nd Floor Ste. 201 is the same person as Dr. John Allen Smith, Emergency — Sports Medicine specialist at 999 St. Hwy 999 North, Floor 2. Building software that can explicitly (that is, by symbolic programmatic logic) encode all such human intuitions is a monumental, if not impossible, endeavor. Instead, in order to crack this problem, Matcherize employs a toolkit comprising statistical methods, rule-based inference engines, and machine-learning models paired with Castlight’s cross-team deep expert knowledge of the world of providers.

How they fit in

We will dive into more details of the unique challenges of working with provider data, advancements made to date, and specifics of the technical approaches employed — in a future post on this topic. For now, we’ll leave you with Figure 2., showing how the two projects fit in the Castlight data ecosystem.

*Figure 2. Provider Directory Data Science and Engineering Systems*

We would like to thank Teri Bradshaw, Senior Manager at Castlight Health, who manages the projects detailed in this post, and Anne Hunt for her product leadership on these projects during her time at Castlight Health.

Engineering Provider Knowledge at Castlight Health — Part 2

by Sree Iyer, Vinay Yadappanavar

Kripke

Matcherize

How they fit in

Written by Vinay Yadappanavar