From Symptoms and Mutations to Diagnoses: Doctor.ai as a Diagnostic Tool
Get quick diagnoses using natural language via AWS Lex and Neo4j knowledge graph
This article shows how to:
1. Develop the diagnostic functions in Doctor.ai based on symptoms and mutated genes.
2. Retrieve disease descriptions from KEGG
3. Configure AWS Lex so that we can use natural language to interact with this new Doctor.ai.
Disclaimer: The software solution in this article is not a substitute for professional medical advice, diagnosis, or treatment. It is intended for informational purposes only.
Imagine that one day during the COVID-19 lockdown, a blind patient complained about dyspepsia (discomfort or pain in the upper abdomen, often after eating or drinking), weight loss and fever at home. He wanted to know the possible medical causes, but he could not go to the hospital immediately. Wouldn’t it be great if a virtual healthcare expert could give him a quick rundown on all the possible causes? Wouldn’t it be nice if this chatbot could even explain the diseases in detail? This article tries to present one possible answer.
In my previous articles (here, here and here), I described Doctor.ai — a chatbot for healthcare. At first, it was designed to manage a large quantity of patients’ medical records. Later, I plugged it into three knowledge graphs — Hetionet, STRING and KEGG. As a result, Doctor.ai connects together patients’ specific medical histories with over 2,600 diseases, 21,000 genes, 343 pathogens and 2,582 medical compounds. So it not only can retrieve patients’ records with ease, but also provide the users with additional information about the ailments. And we can interact with Doctor.ai using natural language thanks to AWS Lex.
However, we can exploit these knowledge graphs further for a whole new purpose: diagnosis. On the one hand, a user can dictate a list of symptoms and expect Doctor.ai to answer back with a list of possible diseases. On the other hand, as more and more people have their own genomes sequenced, a doctor can ask Doctor.ai to list all possible genetic disorders of a patient based on the patient’s gene mutations. Finally, Doctor.ai can even give a detailed description about the diseases to the users.
These are possible because the Hetionet knowledge graph connects symptoms to the diseases. And both Hetionet and KEGG connect genes to genetic disorders. So we can use multiple MATCH
statements in Neo4j to narrow down the candidate disease list. Furthermore, KEGG contains detailed descriptions about the diseases. There is one small drawback though: the descriptions are quite academic.
In this article, I am going to show you the making of these diagnostic functions. And because this is an upgrade of my last project, the code for this project is hosted in the same Github repository:
The dump file is here.
https://1drv.ms/u/s!Apl037WLngZ8hhj_0aRswHOOKm0p?e=7kuWsS
1. From symptoms to diagnoses
Imagine that a patient complains about dyspepsia, hiccup and edema (swelling) and wants to know the possible medical causes. In the knowledge graph of Doctor.ai, we can run the following Cypher query and get the answer:
With three MATCH
statements, this query tries to find diseases that lead to all three symptoms. In this hypothetical example, the query result hints at stomach cancer, which was the third leading cause of cancer deaths worldwide according to the GLOBOCAN 2018 data.
AWS Lex is Doctor.ai’s mouth and ears, while AWS Lambda is its hands that carry out the database actions. To make diagnostic dialogs based on symptoms, Doctor.ai needs an upgrade in its Lex and Lambda definitions. I need to implement a Lex intent called AskForDiagnosesFromSymptom
that captures the list of symptoms and transmits them to a Lambda function. The Lambda function will construct a Cypher query like the one above, run it against the Neo4j knowledge graph and return the answer to Lex.
The new AskForDiagnosesFromSymptom
looks similar to other Doctor.ai’s intents with the usual sample utterances and fulfillment definitions, except that its required slot symptom
is a Multi-valued slot
(Figure 1).
A multi-valued slot can parse a list of items separated by spaces, commas or the word “and”. So if a user says:
I have Dyspepsia, Hiccup and Edema. So what kinds of diseases can it be?
Lex will be able to return the list of symptoms with the value [Dyspepsia, Hiccup, Edema]
. In Lambda, we can construct and run the Cypher query with the following code:
With this upgrade, Doctor.ai can make the following conversation (Figure 2).
Doctor.ai not only lists the possible ailments, but also all their associated symptoms. This can help the patients to perform further self-exams.
2. From mutated genes to diagnoses
More and more people have their own genomes sequenced. With the advance of Genome-Wide Association Studies (GWAS), we are fast approaching the era of personalized medicine. Doctor.ai should be ready to ingest all this data. Then we can scan each individual genome for diseased-associated variants and save the results in Doctor.ai. Afterwards, Doctor.ai can query the knowledge graph for possible genetic disorders. Let’s say that we have a patient called Hans with a defected CAT gene and a defected SLC9A6 gene (Figure 3).
In contrast to the AskForDiagnosesFromSymptom
intent in the previous section, we need to capture the patient’s name in a single-valued slot in the AskForGeneticDisease
intent. Thanks to Context, Lex can even understand pronouns by getting the patient’s name from a previous dialog.
In Lambda, I construct and run the Cypher query like this:
With these simple upgrades, Doctor.ai can now also tell Hans the bad news: he could have Christianson syndrome and acatalasemia.
It is worth noting that Christianson syndrome has an X-linked recessive pattern, while acatalasemia has an autosomal recessive pattern. That means for a male patient like Hans who has only one X-chromosome, one mutated SLC9A6 is sufficient to cause Christianson syndrome. In contrast, Hans has two CAT copies. If only one CAT is defected, his activity of catalase is reduced by approximately half. But if both are defected, his catalase level is reduced to less than 10 percent of normal. Currently, Doctor.ai cannot capture these nuances.
3. KEGG disease descriptions
When a doctor wants to know more about a certain disease, Doctor.ai should be able to help. A simple approach is to return the KEGG disease description to the user.
The AskForDiseaseDescription
intent just needs to get the name of the disease.
In Lambda, the Neo4j Python code looks like
Afterwards, Doctor.ai should be able to describe the diseases.
As you can see, the KEGG description is not exactly an ELI5. It is intended more for the doctors or academic experts.
Conclusion
In my last article, you can see that the addition of three knowledge graphs makes Doctor.ai more knowledgeable about infectious diseases, pharmaceutical compounds and genetics. And this article ventures into the domain of diagnostics. With some simple codings, I added both symptom-based and gene-based diagnostic capacities to Doctor.ai. I am sure that if we keep digging and exploring, we can find even more new usages. The result will be a more intelligent Doctor.ai that can benefit many people.
Meanwhile, I also see rooms for improvement. First and foremost, medicine is a matter of life and death. We therefore need to rigorously review the diagnoses from Doctor.ai. Since patients’ symptoms are normally included in their medical records and Doctor.ai is primarily the guardian of medical records, we can use machine learning to improve its diagnostic accuracy. On the genomic side, Doctor.ai cannot capture the nuances of alleles. This will potentially lead to overdiagnosis. Furthermore, at the moment, the genetic variant calling is carried out outside of Doctor.ai. Many disease-associated variants are known. But more are unknown. Not all mutations lead to diseases. So the challenge is how to diagnose harmful mutations. Finally, the KEGG disease descriptions are quite hard. We can add a plain text version that is more accessible to the general public.
On the AWS’ side, Lex sometimes interpreted the pronoun “he” as a name “he”. There is also a risk that with more and more intents, Lex will get confused and cannot classify users’ intents correctly. In terms of speed, the current AWS setup seems at times slow. It will become worse when more and more users join the system. Perhaps a switch from Neo4j Community to Enterprise can solve the scaling issue.
As Annie Murphy Paul wrote in her book The Extended Mind, by offloading data from our mind onto the more stable and reliable stuff of the world, we can greatly increase our mental power. And the knowledge graph is definitely one of such reliable stuff. Its model structure closely resembles that in our mind. So we can intuitively extend our thinking into a knowledge graph. With knowledge graphs, we make fewer mental mistakes, obtain the full picture and may even get new inspirations. What is more, users can navigate this ocean of knowledge without writing a single line of code thanks to natural language understanding from AWS Lex. As a result, patients will be better informed, while the doctors can provide the patients with more face time, more accurate diagnoses and treatments. That is, better healthcare for all.
Licenses
Hetionet is released as CC0. STRING is freely available under a ‘Creative Commons BY 4.0’ license, while academic users may freely use the KEGG website.