Develop with Synthetic Patient Data from Synthea

Discover patient pattern with GPT-3 and Neo4j

Sixing Huang


While the individual man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what any one man will do, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant.

— Sherlock Holmes in “The Sign of the Four”

Photo by British Library on Unsplash

In my previous posts (1, 2, 3, 4, 5, 6, 7 and 8), I have described our medical voice chatbot In its database, manages a large amount of medical records. These records represent the medical journeys of many individual patients. As Sherlock Holmes said in The Sign of the Four, the aggregate is predictable, even though the individual is not. A seminal example is that John Snow used a dot map and statistics to identify a public water pump as the source of the cholera outbreak in London in 1854. So with careful data mining, it is possible to induce patient patterns from these medical data, such as the infection and death data broken down by race and ethnicity during the current COVID-19 pandemic (9). was developed originally with the eICU dataset. That dataset is large. But it has three drawbacks. Firstly, the dataset is not easily accessible…



Sixing Huang
Writer for

A Neo4j Ninja, German bioinformatician in Gemini Data. I like to try things: Cloud, ML, satellite imagery, Japanese, plants, and travel the world.