Interoperability in the age of LLMs
I used to joke that phone calls and faxes are the APIs of healthcare. Even with the establishment of data standards, most notably FHIR, and increased adoption of EHRs (following the The American Recovery and Reinvestment Act of 2009), the healthcare system’s ways of sharing information for and about patients remain brittle, unstructured, manual — in a word, quaint:
- CSV formatted SFTP file drops, attachments to secure emails, CDA style patient records, or HL7 messages–instead of API calls
- Faxed or downloaded print-outs of medical charts
- RPA solutions to suck data out of SaaS tools–that break when even the smallest snippet of HTML changes or, worse, become fodder for lawsuits
- Large mail rooms to process reams of snail mail
- Phone calls. Lots of them. To ask for clinical notes. To schedule appointments. To see if a drug or medical equipment is in stock. To follow up with a patient after a visit. To check eligibility ahead of a visit.
In theory a lot of this information exists in a digitally shareable form — the E in EHR is for electronic! — and other industries have managed to solve similar problems. I can check a restaurant’s availability on Resy, how many seats on an airplane are still for sale on Expedia, or whether a toaster is still in stock on Walmart.com. For that matter, I can use my iPhone to send emojis to and see reactions from my Android-loving friends, and use Google Calendar to schedule a meeting with my accountant who’s on Outlook.
So far policymakers’ interoperability priorities have largely focused on requiring payors to invest in APIs for others to tap into, setting up the governance for national data exchange and enshrining FHIR as a technical standard. These are undoubtedly helpful and the latest iteration, TEFCA, is poised to improve upon prior efforts. Yet the industry continues to face a uniquely challenging set of realities:
- The technical standards are interpreted differently, in part because the underlying data is so complex; the standards are also data models rather than ontologies — they allow for a lot of heterogeneity in what’s used to represent a given healthcare concept (thankfully Tuva Health is stepping in to close this gap).
- Traditional healthcare companies face enormous costs in rewriting their tech stacks to meet the new standards. Healthcare companies also rely heavily on vendors who themselves will require major under-the-hood updates.
- Policies promoting data openness tend to target payors (e.g. CMS-0057F). However, while payors aggregate patient and provider data, they are not the source of truth for most of it (providers or EHRs are).
- Data gatekeepers, chiefly the EHRs, face competition and lost revenue if data flows too freely and will continue to contrive new hurdles even with the current set of Information Blocking disincentives.
- Healthcare companies are bound by federal as well as state-by-state data privacy regulations that make them reluctant to open the data floodgates even if technology were not a barrier.
- Standards inherently steer clear of tackling the currently unstructured datasets that many companies ultimately need access to.
- The existence of standards does little to alleviate the tangled web of integrations among payors, providers, PBMs, and vendors. Flexpa is making exciting progress here for certain types of structured data that patients can consent to share. In theory, what might solve it in the business-to-business setting (because, let’s face it, patients are generally not the ones requesting their data) is a national clearinghouse (a la Commonwell-Carequality or the consortium of QHINs as set forth in TEFCA) but participation is nascent, in part due to the factors cited above. In addition, any time you pool data it can muddy data provenance leading to inappropriate data disclosure (or the fear of it, and therefore skittishness), plus the Change outage reinforced the risks of concentrated data ownership (and, no, blockchain is not the answer).
Enter large language models, which, among other powerful capabilities, have proven adept at structuring text into a predefined syntax. If before faxes were a way for healthcare companies to exchange paper print-outs, with lots of last-mile effort to make use of the data they contained, now you can stick an LLM in the middle of that exchange and each side can get whatever data they want from the document in whatever style or format they need. Timing wise, LLMs arrived shortly after TEFCA was first published which means we haven’t had time to fully reckon with the implications.
In the past few months, AI voice agents have quietly emerged as a new puzzle piece that just might complete the picture. Whatever can’t be solved through a traditional API or, less elegantly, as a raw-text-to-LLM pipeline can now be automated as a phone call to a doctor’s office, drugstore, or health plan. This data gathering and sharing effort currently falls on individual patients (or office staff) who are unable to benefit from what thousands of other callers like them are redundantly uncovering in an uncoordinated set of interactions with the healthcare system.
On the receiving end of an AI voice agent’s inquiries, it won’t be long before a new generation of LLM-based computer use models (such as Anthropic’s or the forthcoming “Operator” from OpenAI or stand-alone vendors) are playing the role that RPA solutions do today — plugging into an EHR, Pharmacy Management Software, or CRM interface to extract whatever info is needed to answer a question for or about a patient–only without all of the scraping-related brittleness.
In other words, the joke about “calls and faxes as healthcare’s APIs” is becoming a serious alternative (or at least a complement) to standards-driven interoperability. Why go through the effort of migrating thousands of slow moving healthcare companies and vendors and tens of thousands of workflows to new standards when you can let technology standardize the data that flows through today’s existing pipes?
Of course, there is the cost of inference to worry about (as much as $1.50 for a 10 minute AI voice call). But it doesn’t have to be just one or the other. Standards can help to simplify the problem for an LLM by narrowing the scope of the problem. Flexpa recently published a set of benchmarks for which LLMs do the best job of mapping raw clinical data to FHIR resources and validating schema adherence among other interoperability tasks — illustrating how LLMs can actually help lower barriers to standards adoption.
Meanwhile, LLM agents can chip away at the long tail of data elements and data exchange use cases that are farther out on the FHIR roadmap. That is, the highest volume use cases can still flow through formal APIs and information exchanges — though as inference costs continue to fall, the prospect of amortizing a gargantuan upfront investment over a stream of (steadily declining) per token fees becomes more and more appealing.
This kind of “interoperability bootstrapping” is not without its drawbacks (and like anything can lead to abuse). For example if only one side of the interaction is using AI, the other side incurs the cost of a human agent responding to it (arguably, a cost it would bear anyway). Also, unstructured conversations have to be re-structured for internal use which can be error prone. But over time, a new paradigm of bot-to-bot voice interactions could take shape. Inefficient (and surreal) though it may seem to have two AI agents conversing, these interactions could eventually capitalize on data-over-sound techniques to reduce latency to near-API-like levels.
Through this lens, interoperability policy priorities should account not only for API implementation and standards adoption but also increasingly for guidelines that ensure data-holders play well with responsible data-seekers employing an expanded range of integration techniques. Rules should encourage resourcefulness that aligns with the broader policy goal of improving clinical access and outcomes. More than anything, recent AI activity underscores the need for policymakers to be nimble in an era of breakneck innovation. This means leaning into public-private partnership, and deciding when to double down on previous policy victories and when to adapt to new technology-enabled possibilities.
—
Duncan Greenberg was previously SVP of Product at Oscar Health where he led product management, product design, and UX research. During his 8 years at Oscar, he played a leading role in many parts of the business from spear-heading the company’s AI initiatives, to launching Oscar’s virtual primary care offering, and more. Before Oscar, he held a series of product leadership roles, and in a previous life was a journalist at Forbes. He is currently an angel investor, start-up advisor, and proud husband + father of two living in New York City.
—
If you’re a builder or operator in digital health or health tech, I’d love to hear from you — DM me here.
—
Thanks to Otto Sipe, Bill Williams, and Amanda DeBrule for their helpful feedback on this piece.