The science of assisting medical diagnosis: From Expert systems to Machine-learned models
Curai’s mission is to scale the world’s best healthcare for every human being. We are building an augmented intelligence system to help scale physicians’ abilities as well as to lower users’ barrier to entry to care. There are many components to such a system, but disease prevention, diagnosis, and treatment are central to providing best clinical service.
The medical diagnostic process, defined as “a mapping from a patient’s data (normal and abnormal history, physical examination, and laboratory data) to a nosology of disease states”, [1] starts with a differential diagnosis (DDx), consisting of a ranked list of possible diagnoses, that is used to guide further assessments and possible treatments. In 2015, the Institute of Medicine (IOM) called for a “significant re-envisioning of the diagnostic process” and reported that “nearly every person will experience a diagnostic error in their lifetime” [2]. Just in the US alone, outpatient diagnostic errors account for 5.08% of medical errors, affecting 12 million US adults every year, with more than half of these errors being potentially harmful [3]. A recent study demonstrated that diagnosis accuracy is around just 60% when only one medical provider is involved in the decision-making process.
The need for diagnostic decision support systems for assisting medical providers has been well understood for a long time. Medical experts, including Bruce G. Buchanan, Edward H. Shortliffe, and Jack Myers pioneered some AI efforts (Mycin and Internist-1) directed towards this goal. At Curai, we are researching how to use these curated expert systems as a prior for modern machine-learning-based approaches. This represents a real step forward in AI-based diagnostic systems that combine the best of both old and new-age approaches: latent causal relationships that are encoded in such expert systems but are further probabilistically refined and continually self-improving by learning from new data.
Why is medical diagnosis hard?
Before discussing the role of AI systems in medical diagnosis, we need to understand why diagnosis is such a hard problem. In a perfect world, medical diagnosis can be viewed as a complete information game in which the medical provider has all the information needed for diagnosis, and the entirety of this information is used to draw conclusions. However, there are many factors that prevent this from happening in the real world:
- Patient holding back on pertinent information that they perceive as unimportant or irrelevant.
- Cost of acquiring information, whether that be the cost of a test or the time it takes to further talk to the patient, given that a typical medical provider’s visit lasts only 15 minutes.
- Reliance on recency and availability bias due to sheer number of diseases that are possible, and the need to quickly arrive at a diagnosis [4].
- Lack of appropriate diagnostic tools to gather complete information about the patient for certain diseases such as mental health, chronic fatigue and lyme disease.
- Patient accessing different disconnected areas of healthcare resulting in incomplete view of their medical record.
In addition to all these factors, we also want to draw attention to the fact that diagnosis is an evolving task that is temporal in nature — as a medical provider acquires more information, not only does the context in which they are diagnosing changes, but the actual diagnosis for a patient may change. For example, a simple cough could often be thought as the result of a cold, bronchitis, or acid reflux when seen in isolation. When that cough has been worsening slowly over a longer period of time, other diagnoses must be considered.
At Curai, we believe that AI diagnostic systems help medical providers solve the complete information game; they can play an integral role in gathering all the required information from the patients. With the task being hard enough as is, we believe software can assist medical providers not only in solving patients’ problems but also by enumerating possibilities and edge cases they may have otherwise missed.
AI for medical diagnosis
The figure below provides an overview of the evolution of AI models that assist doctors with differential disease diagnosis.
Medical Expert Systems
Diagnostic expert-based systems are computer systems that seek to emulate the diagnostic decision-making ability of human experts. Some notable systems include Mycin for infectious diseases, and Internist-1, QMR and DXplain for general internal medicine.
Medical expert systems generally include two components: (1) a knowledge base (KB), which encapsulates the evidence-based medical knowledge that is curated by experts, and (2) a rule-based inference engine devised by the expert, which operates on the knowledge base to generate a differential diagnosis.
Diagnostic knowledge bases generally consist of diseases, findings (i.e. symptoms, signs, history, or lab results), and their relationships. In many cases, they explicitly lay out the relationships between a set of findings and the things that cause them (diseases). For example, a KB might include influenza and show its relationships with fever, coughing, and congestion. A common approach to modeling relationships between diseases and findings is by using variables that encode positive predictive value and the sensitivity of a finding to a disease. In particular, the evoking strength (positive predictive value) captures how strongly one should consider a disease if the finding was observed, while the frequency (sensitivity) models how likely it is that a patient with a disease manifests a particular finding.
The rule-based inference engine outputs a ranked differential diagnosis by scoring the diseases in the knowledge base as a function of their relationship strengths over all of the input findings. In other words, given a set of findings from a patient, inference engine examines the strength of the relationships those findings have with each disease in the KB and sort based on some defined scoring function. See [1] and [4] for additional details.
Unfortunately, the practical use of these systems has been constrained by several factors [5]. First, maintenance of expert systems requires a dedicated team of experts who continually monitor and reconcile new research. As a practical example, most studies regarding diabetes in the South Asian population have occurred fairly recently and the knowledge from these new studies can be incorporated into the knowledge base only with laborious and time consuming manual intervention. Second, it may take several years for a newly identified disease to be part of the expert system, as it may be costly to run medical studies. Consider, for example, the newly identified Zika virus, which is not yet well understood and thus cannot be incorporated into a medical knowledge base with any confidence. Third, an inference engine based on an underlying knowledge base is devised with the assumption that it will be used in a noise-free environment with all the information available. For example, a patient might answer no to a question from the system about having a hematoma simply because they don’t understand that the bruise on their leg is called a hematoma.
Learning to Diagnose using Expert Systems as a Prior
There has been prior research on interpreting expert systems as a probabilistic graphical model. These models typically have a bipartite causal graph structure modeling dependencies between diseases and findings (see figure below) . The prior probabilities of the diseases and conditional probability distribution of the findings are fixed and derived from the variables in the expert system. When some findings are observed, probabilistic inference can be used to obtain the posterior distribution (differential diagnosis) over the diseases. Due to explaining away and uncertainty modeling, the model can be resilient to noise. However, exact inference is difficult in this model family and several approximate inference methods such as Markov Chain Monte Carlo, variational inference and recognition networks are computationally expensive or not accurate enough to be deployed in practical settings.
Last year, we set out to tackle some of the discussed challenges, and asked the following questions: How could we improve the resiliency of expert systems in the presence of partial or noisy inputs? How could we leverage the curated medical knowledge encoded in the expert systems, as we learn models from other data sources such as electronic health records? We answered these questions by letting the knowledge encoded in expert systems serve as the prior for learning a new diagnosis model from scratch. The central idea behind our work is that you can utilize expert systems as a data generator and the generated synthetic medical cases can be used as labeled data for training a model. The figure below gives a quick overview of our approach. Note that the models provide the optional ability to incorporate clinical cases from other data sources. We presented the detailed results in this paper that showcased the resiliency of the approach to noisy inputs, and the efficacy of the approach on additional cases from electronic health records.
Clinical case simulation: We used a simulation algorithm described in [6] to generate clinical vignettes conditioned on different diseases. The simulator iteratively sampled findings based on their relationship to the pertinent disease for which the clinical vignette was generated. The figure below shows simulated examples corresponding to two diseases: acute viral hepatitis and acute septic arthritis.
Modeling choices and results: We posed diagnosis as a classification task and experimented with different modeling approaches. In the figure below, you can see that training a model drastically improved accuracy over other approaches including probabilistic inference on a graph derived from expert system and invoking the inference engine of the expert system.
Similarly, our deep neural network was much better than linear models such as logistic regression. This is not altogether surprising, since a deeper network allows for learning interdependencies between findings to explain away diseases. You can also see from the figure that our final learned model is much more resilient to noise at the time of diagnosis. This is an important property, especially when the model is deployed to assist real medical providers; in such a setting, inputs to the model can (and likely will) be noisy. As further evidence towards how this method can generalize and expand the capabilities of expert systems, we demonstrated that by adding additional data sources, we can expand disease coverage and more precisely model existing diseases.
Conclusions
So what does this all mean? The work described here represents, to us, an exciting and discrete step forward in self-improving diagnosis systems from the static and somewhat brittle world they have lived in for a long time. It is worth adding that we feel these types of methods represent an exciting area of AI research as they present opportunities for mixing causal forms of AI research with the massive step functions deep learning has introduced in the past decade to achieve better accuracy and resiliency to noise. We also found that this approach can be a valuable tool in settings where you can cobble together a number of diverse but related data sets.
There is very exciting research ahead of us that should vastly improve the quality of medical diagnosis and patients’ outcomes, eventually making this a world in which “no person needs to experience a diagnostic error in their lifetime.”
If the mission of scaling the world’s best healthcare for every human being excites you and you want the opportunity to contribute to these exciting advances in AI research, please check out our career page or reach out directly.
Acknowledgements
I would like to thank my collaborators, Murali Ravuri, Geoffrey Tso, and Xavier Amatriain, who were integral to the research described in this post. Thanks to Jack Craddock, Neal Khosla and Vignesh Venkataraman for their detailed inputs to this blog post.