Knowledge Graphs for Clinical Management — Part I

Published in

Curai Health Tech

10 min readJul 23, 2020

Omar Alonso, Bill Andersen, Geoff Tso

Introduction

Curai customers come to us with health concerns and enter into a chat-based dialog with physicians who give advice, diagnose, order lab tests, prescribe medications, and refer to specialists. We view the challenge of providing high quality virtual care not as a series of one-off encounters with patients we know little about, but rather providing quality longitudinal care — in addition to servicing the patient’s needs today, we have to keep track of their health and care over time.

The current healthcare system has left many patients wanting for healthcare that is accessible no matter their socioeconomic status, knows them, and provides them with care that is effective and efficient. For example, this means that a patient can come to us after having lost their job and their healthcare insurance. When they randomly develop a cough and fever, they can sign up with our clinic, allow us to download their health records from their previous doctors and start a chat immediately. Our providers will use the information from their historical health record and their story about their symptoms to craft a personalized diagnosis and treatment plan taking into account their asthma. Curai would store and remember everything in the chat so that when the patient comes back in four weeks with lingering cough, we immediately know how to use all of the longitudinal information we have about the patient (asthma => cough and fever => persistent cough) to easily identify the next steps in their care. Figure 1 shows an example of a chat session.

Figure 1. A patient is describing a situation in plain text. Curai understands the input and asks questions accordingly based on healthcare knowledge.

The maintenance of electronic health records (EHR), the ability to prescribe and track medications and tests, make specialist referrals, and drive our machine learning systems, all require the fusion and coordination of various forms of knowledge that are captured in data exchange and ontology standards, with content ranging over events in the clinical setting, disorders, findings, anatomic sites, medications, lab tests, measurements, and many others.

Because our application is chat based, interactions between patients and providers are in the form of natural language. So, in addition to organizing the above forms of conceptual knowledge about health and health care, we must capture a broad range of natural language phenomena pertaining to the way patients and providers talk about medical concepts — the different names by which they’re referred to, the questions one can ask about them, among others.

To get a better sense of the patient’s clinical need, we make heavy use of machine learning techniques for named entity recognition, diagnosis, and history taking recommendations to assist both patient and provider. Because those ML models draw heavily on natural language and classifications to conceptual content, the maintenance of this knowledge becomes an essential part of the development and maintenance of ML data pipelines.

To support all of these activities, and more, we have developed a knowledge graph for clinical management.

What is a Knowledge Graph?

The definition of a “knowledge graph” remains contentious as different explanations and interpretations have emerged from different areas [1, 3]. We take a practical approach and define a Knowledge Graph (KG) as a graph that describes entities (objects of interest) and relationships. That is, nodes represent entities and edges represent relations between these entities.

A knowledge base (KB) is a collection of records, usually stored as triplets, in a database that refer to some kind of knowledge about a domain. Traditionally, the term knowledge base is used in the context of knowledge-based systems that store encoded knowledge coupled with an inference engine that can derive new facts or answer queries over such a knowledge base.

We prefer the term KG as an umbrella phrase to describe techniques for storing and integrating data from various heterogeneous and dynamic sources that are a bit less restrictive than traditional knowledge bases. Modeling data as a graph offers more flexibility for integrating new data sources.

KGs have gained a lot of attention lately as core assets for helping in a wide range of applications. We mention two well-known use cases. In Web search, Microsoft Satori and Google Knowledge Graph are used respectively to power a segment of user queries. In e-commerce, the Amazon Product Graph powers a lot of purchase recommendations [2, 3].

Commercial KGs use behavioral data (e.g., user clicks, queries, sharing activity, etc.) as an important signal for detecting interest and, in conjunction with large scale data sets (e.g., the Web, product catalog, etc.), large scale information extraction, selection, and matching is done.

Open domain KGs like YAGO and DBpedia are usually based on available sources like Wikipedia and WordNet. There are benefits for this: the raw source is easy to read and parse, and it is (for head entities) usually up to date. However, for not so popular entities the information may be sparse, outdated, and inconsistent or ambiguous.

Healthcare is in a different category, as information tends to be more precise with the availability of medical data based on standards and curated by experts. The ability to have human-in-the-loop patterns is needed for high precision and high recall. The medical domain is hard and there is a need to capture knowledge from experts. In contrast to KGs for Web search, the stakes for healthcare, where regulation and law, not to mention patients’ health, play a significant role, are much higher.

Knowledge Graphs for Clinical Health Care

Clinical management at Curai needs KGs to support a number of functions of our platform:

Charting encounters on Curai Health’s electronic health record (EHR).
Interoperability with information systems outside Curai.
Named-Entity Recognition — given a patient’s initial complaint, the NER should tag medical and clinical entities.
Classification — e.g. classify an initial patient chat as an informational session or requires a visit to a hospital.
Diagnosis — ability to provide a diagnosis (with certain probability) given a chat session as described in a previous AI for diagnosis post.
Question Answering: ability to automatically ask follow up questions or provide answers to common situations.
Summarization: synthesizing the most useful information in a patient-doctor conversation.
Search: ability to search by keywords, structure or in natural language.

Correctness and Traceability

In healthcare applications, however, there are serious implications, ranging from incorrect predictions from ML systems to legal and regulatory issues. For this reason we feel any KG adequate for clinical management must be constructed with the following properties in mind:

Integrity. Taxonomic and other types of relations carry intended meanings that can be formally specified and verified.
Provenance. Every piece of information in our KG is tagged with the sources from which it was obtained.
Traceability. Some KG content must necessarily be inferred, rather than explicitly entered or imported. This inference process must be transparent to users.
Conflict detection and resolution. We rely on the integration of multiple, independently developed content sources, and it is the norm, rather than the exception, for these sources to conflict. Such conflicts are automatically detected and resolved either automatically or by a human-in-the-loop.

Heterogeneous Source Integration

Curai KG for clinical management integrates content from multiple heterogeneous ontologies and reference vocabularies, including SNOMED-CT, UMLS, RxNorm, LOINC, DXplain, etc., each of which provides various types of content we use. SNOMED-CT, published by the National Library of Medicine (NLM) of the National Institutes of Health (NIH), is an official US Government Standard ontology for exchange of medical information. The Unified Medical Language System (UMLS), likewise published by NLM, is a compendium of controlled vocabularies (including SNOMED) along with lexical information. RxNorm, also by NLM, is a database of drug products and their constituent chemical ingredients. LOINC, created and maintained by the Regenstrief Institute, is a database and universal standard for representing medical laboratory observations. Finally, we make use of the database of diagnostic concepts (disorders and clinical findings) from the DXplain system, a clinical decision support system from Massachusetts General Hospital. All these different standards are utilized by commercial EHRs solutions.

Each of these sources has considerable complexity, containing both taxonomic and “lateral”’ relationships (e.g. anatomic, functional and causal relations). It should be noted that there is considerable overlap between these sources: Drugs in RxNorm and disorders in DXplain overlap with concepts described in SNOMED, which further overlaps with LOINC. UMLS was designed as a lingua franca to tie together these multiple systems but itself contains considerable gaps and data quality issues. The same can be said of the other systems.

While each of these systems has its problems, utilizing them by curating and integrating their contents is far more economical than recreating their content from scratch. Figure 2 describes the KG generation process that includes a data fusion component from different medical sources in conjunction with a human-in-the-loop framework that enables medical professionals to perform data quality checks.

Figure 2. Data fusion from different input sources including a human-in-the-loop component for data quality checks.

Our strategy

The design approach we have adopted rests on directly addressing the challenges mentioned in the previous section. In particular, we focus on the following properties we feel essential for implementation of a KG for clinical management:

Data graph

A flexible data model that fusions different data sources into nodes and edges. Modeling data as a graph provides more flexibility for integrating new sources and allows us to represent incomplete information.

Scalability

We do not face the problems of scale faced by Web-scale KGs that contain millions of concepts and relationships that serve any type of queries. What we do need is the capability to quickly evolve KG content and deliver that content to our applications with a minimum of human intervention and effort.

Access

We believe that a KG as a service is an interesting architecture for our setting as it allows applications and/or features to query and extract specific information. We call this QFDs (Query-focused datasets) — a mechanism for creating queries against the KG that can be pre-computed or materialized for efficiency based on specific application needs.

Human-in-the-loop framework

None of the already mentioned medical standards are complete — physicians on our clinical informatics team are constantly modifying and augmenting the ability of our systems to handle new clinical and linguistic knowledge and to assist in the integration of this knowledge into our ML systems and other applications. At Curai we have our own implementation of human-in-the-loop strategies for curating and evaluating different data aspects that are then ingested in our KG. We’ll describe our specific implementation in a future blog post.

We developed an in-house tool called X-Ray for searching and curating our KG. Figure 3 and 4 show examples of search results and how healthcare experts would curate specific data items using a web interface.

Figure 3. Searching the Curai KG using X-Ray, a search and browse tool.

Figure 4. Ability to modify specific entries, labs in this case, using a simple user interface.

Augmented Clinical Care

Our knowledge graph has been fully integrated into Curai’s tech stack. Our patients and providers interact with the knowledge graph both explicitly and sometimes unknowingly throughout a patient’s journey in receiving care. As described above, the data generated during patient-provider interactions is fed back into a learning loop that improves the knowledge graph. This, in turn, improves the intelligence of our platform’s ability to augment the clinical care received on our system.

For example, when a new patient signs on to our platform, they are taken through a dynamic intake flow that is driven by the knowledge graph. The flow guides the patient in inputting their medical history such as chronic conditions, medications, allergies, and demographic information. This information sets the background of structured data that drives the technology. Figure 5 shows an example for medical conditions.

Figure 5. Patient entry of their past medical history supported by knowledge graph.

When they begin a chat with the provider, the patient describes the reason for the encounter in free text, and our NER extracts additional data (symptoms) that will serve as a starting point for our clinical decision support algorithms. Figure 6 shows an example of entities extracted from a patient that describes “I have a fever and wet cough with a runny nose”.

Figure 6. Medical entities extracted from the patient’s chat.

With a starting set of information on the patient’s present and past medical information, the encounter can proceed with the patient chatting with a Curai healthcare provider whose work is augmented and guided by our algorithms. The assistive technology with a human in the loop, driven by the knowledge graph, improves the quality of the clinical conversation by predicting what a provider might want to say and do while also ensuring that they do not miss asking any critical questions or considering important diagnoses. This augmentation along with knowledge graph-based technology that reduces their administrative responsibilities frees the provider to provide high quality patient care. Figure 7 shows relevant questions based on the patient’s description described earlier.

Figure 7. Clinical questions related to a reason for encounter.

Equally important, it gives the provider the time to explain their medical decision making in a feedback loop that improves the knowledge graph.

Conclusion

We’ve presented an overview of Curai’s approach for building a clinical management knowledge graph. There are technical details on how we process the input sources and how we combine such data into a graph. We’ll cover the specifics in Part II.

References

[1] Hannah Bast, Björn Buchhold, Elmar Haussmann. “Semantic Search on Text and Knowledge Bases”. Found. Trends Inf. Retr. 10(2–3): 119–271 (2016).

[2] Dieter Fensel, Umutcan Simsek, Kevin Angele, Elwin Huaman, Elias Kärle, Oleksandra Panasiuk, Ioan Toma, Jürgen Umbrich, Alexander Wahler. Knowledge Graphs — Methodology, Tools and Selected Use Cases. Springer 2020

[3] Natalya Fridman Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, Jamie Taylor. “Industry-scale knowledge graphs: lessons and challenges”. Commun. ACM 62(8): 36–43 (2019)