Engineering Provider Knowledge at Castlight Health — Part 1

by Sree Iyer, Vinay Yadappanavar

Published in

apree health (Castlight) Engineering

3 min readApr 18, 2019

We’re drowning in information but starved for knowledge — John Naisbitt, Megatrends

Castlight integrates with thousands of health vendors, benefits resources, and plan designs to offer a healthcare navigation platform that offers a personalized care-guidance experience to employees of our customers. Castlight’s platform empowers our users to make better healthcare decisions and to stay actively engaged in managing their health and wellness by demystifying their benefits, presenting the most relevant options from providers to programs and educational content, and highlighting the most salient factors that go into making informed choices, such as clinical quality, convenience, and out-of-pocket charges.

The Castlight Provider Directory (Castlight PD) is a knowledge base of healthcare providers, insurance networks, provider quality data, and other provider data, such as addresses, phone numbers, and medical specialties. It forms the basis of our core healthcare navigation offering. Castlight serves companies and people across the United States and globally, covering almost every possible insurance situation and healthcare need. In contrast to particular insurance companies and employers, Castlight’s data must support users’ needs in all of these contexts. Additionally, the Castlight PD must support all Castlight products, including intelligent solutions such as our clinical-quality based care guidance. The data stored in our PD must be correct in every way — from the data indicating a provider is in-network for a specific plan, to which specialties the provider has, down to every detail of a provider’s demographics. Additionally, the data must be updated in a timely fashion — with the shortest possible lag time between a change in the real world and the update to the data itself. Users of our product demand correct information about their healthcare providers.

The Castlight PD needs to therefore simultaneously perform two often conflicting functions –

Accurately preserve and render core provider data loaded from payer directories, while honoring timeliness SLAs.
Derive knowledge regarding the world of providers, discover relationships among provider-entities, augment payer directories with supplementary data from public or third-party sources and build concepts useful for intelligent healthcare navigation.

One can alternatively think of the former as preserving the payer view to present reliable information regarding in-network providers, their specialties and contact information and the latter as developing a Castlight view of the data designed for enhanced guidance for making informed choices in comparing provider-options in terms of clinical-quality differences, their affiliation relationships with practices and hospitals, awards of excellence, medical board certifications and so on.

Castlight’s first generation PD had organically evolved to take on both challenges in one shot within one combined data architecture. With the maturation and expansion of our customer base, we needed to excel at both functions — perform better and nimbler, as well as implement additional features that could take better advantage of our enhanced provider data. This was too much of an ask for our original Provider Directory architecture. Therefore, we kicked off two projects codenamed Kripke and Matcherize to modularize, redesign and specialize our architecture to effectively take on both challenges and to better position us to leverage our Provider Directory in additional innovative ways.

Provider data from payer and supplementary sources tend to not have reliable and/or shared identifiers due to noisiness in the source data itself, as well as to many of those sources using reference identifiers that cannot be used to reliably link records across sources. Some common reference directories such as the National Provider Identifier (NPI) registry do exist, however, other datasets that use these identifiers often do not have satisfactory fill-rates to be consistently useful as global ids or may confound disambiguation logic through semantic inconsistencies. For example, a medical practice NPI may appear in a record referencing a practitioner who works at the practice.

The transformations the datasets undergo in our data pipelines from file ingestion to a feature being surfaced in the app need to be carefully designed to preserve fidelity. That is, we must minimize the loss of information contained in the raw data at the beginning of the pipelines all the way through the derivation of the most useful higher-order concepts at the end of the pipeline. While project Kripke addressed the question of information loss at the source of the payer data pipeline, project Matcherize tackled the question of cross-referencing payer and supplementary datasets and the construction of strategic provider data concepts which can best support the building of Castlight’s intelligent care guidance applications.

In part 2, we get into how Kripke and Matcherize tackled these challenges.

Engineering Provider Knowledge at Castlight Health — Part 1

by Sree Iyer, Vinay Yadappanavar

Written by Vinay Yadappanavar