Data is at the Core of Health Progress — And The Best is Yet to Come

Data Science has been a Force Behind Medical Progress

Over 250 years years ago Edward Jenner, the “father of immunology” was born in Berkeley (Gloucestershire, not California). He spent years studying small pox, one of the greatest killers of the period. Over many years he made a crucial observation: those who were infected with a certain strain of cowpox (a less severe and non life threatening virus) were very unlikely to get small pox. He tested his intervention and published An Inquiry Into the Cause and Effects of the Variolae Vaccinae. The study was based on his study of twenty-three cases.

A picture of page 36 of An Inquiry Into the Cause and Effects of the Variolae Vaccinae (1798) by Edward Jenner describing the case of John Baker (Case 18 of 23)

This is an early example of data aggregation creating new medical knowledge and improving human lives. And it wasn’t the first time. Many hundred years before Jenner, Indian physicians identified diabetes by observing a group of people with similar manifestations identified by seeing flies and ants around their urine. They also went on to separate the fat ones (Type 2) and the little ones (Type 1). This was precision medicine 1.0.

Many Medical Pioneers Were Data Scientists:

And a hundred years later, another leading light (with a lamp) Florence Nightingale played a very important role in data aggregation when she studied the effects of sanitation on death rates of soldiers and their relationship to sanitation. She demonstrated in her book, Mortality of the British Army, 1858 that there was a strong correlation between mortality of soldiers and the level of sanitation around them. This is a solid work of public health epidemiology.

A data table from Florence Nightingale’s book, Mortality of the British Army, 1858

Her study on the sanitation of rural India convinced her of the value of good sanitation and she convinced the government to set up a Royal Commission into the ‘Indian situation’. Her report showed the ROI sanitation improvement as follows “After 10 years of sanitary reform, in 1873, Nightingale reported that mortality among the soldiers in India had declined from 69 to 18 per 1,000”. Her work on sanitation, made popular by her exceptional data visualisation was able to convince policy makers to invest in it. This was pretty remarkable, given that the germ theory of disease was not established at the time, or indeed that Florence herself was known to have not been entirely convinced of it. For a medical reformer, it is remarkable how much of her work was essentially data analytics. She was elected to be a member of the Royal Statistical Society (not the Royal Society of Medicine) and also the American Statistical Society, both of which were predominantly male at the time.

If Florence Nightingale was alive today, I am pretty sure she would be called a data scientist and would be in high demand. (Linkedin recruiters, you would have been all over her.)

It seems that mid 19th century was when data science started to have a major influence on public policy. John Snow (1813–1858) tracked the source of cholera epidemic in Soho to a water pump. Just like Nightingale, he used clever visualisation by plotting cases on a map:

Map showing the clusters of cholera cases in the London epidemic of 1854, John Snow

Nightingale, Jenner and Snow gained fame in their lives and went on to win accolades and and are household names. Florence’s picture adorned a £10 note a few years ago.

Edward Jenner did not get a bank note but had a crater on the moon named after him. (no, seriously) …John Snow also became famous. He is considered the father of epidemiology but more importantly, he has a pub named after him in Soho.

John Snow Pub, Broad Street, London
Not every medical data scientist was treated as a hero. Especially when their data contradicted established medical practice.

Take Ignaz Semmelweis, for example. Unless you went to medical school in Vienna where he is a household name you may not have heard of him. He was in fact a brilliant data scientist, who discovered that hand washing could reduce transmission of puerperal fever. He observed that there was a big difference between children getting puerperal fever between two hospital wards. He carefully observed them and surmised that the midwives’ ward, where hand-washing was common, had low incidence of fever compared to the doctors’ wards where hand-washing was less common. He carefully collected data and presented it to show the difference in number of cases.

Etiology, Concept and Prophylaxis of Childbed Fever, Ignaz Semmelweis 1861

You would think that someone who makes such a remarkable discovery which demonstrated that chlorine hand-wash could reduce suffering and death would be celebrated, but sadly no. Semmelweis faced severe headwinds when he tried to popularise his work. The medical establishment did not like his conclusions one bit. The more strongly he pushed his idea, the more resistant his colleagues became and refused to even try it, leading to more deaths. Violent reaction to new knowledge which contradicts existing dogma is now known as the Semmelweis reflex

A couple of decades later, Louis Pasteur’s germ theory of disease started to gather momentum and the concept of germs passing through contact and causing disease became evident. But this was too late for Semmelweis. He was deeply affected by the overwhelming rejection of his ideas and, becoming mentally ill, was admitted to a mental asylum a few years later. He is now recognised as a pioneer in his field and an important data scientist of his time.

The Future of Medical Discoveries is Dependent on Us & How we Engage with Data:

Every medical discovery, every clinical trial, every epidemiology study, is data science.

Luckily, now we live in the era of DNA sequencing, electronic medical records, patient reported outcomes, smartphones, wearables. These devices may be new but data as an engine of medical progress is not . Large data sets and connectivity brings new challenges around how we share this data and who benefits from it but the potential to improve lives of people is too great to not address these challenges.

The history of medical progress is history of health data. The golden era of medical progress is only just beginning.