A Better Way to Predict Ambulance Calls from Two CDS Moore-Sloan Data Science Fellows

Noulas and Gonçalves work with collaborators to understand how ambulance calls are spatially distributed

In 1854, John Snow made an early stride in the field of spatial epidemiology by mapping London cholera cases to determine the source of the outbreak. Modern spatial epidemiology has, of course, become more effective, especially by using data to predict medical incidents. Most current techniques related to non-infectious disease, however, focus on predicting the incidence of a particular disease without incorporating analysis of demographic or environmental factors.

Anastasios Noulas and Bruno Gonçalves, both Moore-Sloan Data Science Fellows, avoid this deficiency of spatial epidemiology in new research, which has been peer-reviewed and accepted to the 8th International Digital Health Conference in France.

Broadly, their work aims to “estimate the volume of ambulance calls at the level of individual Lower Super Output Areas (LSOA) in the North West of England,” where LSOA’s are small local areas with a median population of 1,520 people.

To achieve their aim, they pose an important question: “What geographic and demographic features influence the number of calls to an ambulance service?

Answering this question required the researchers to examine geographic and demographic data, which they sourced from the UK government, online media sources, and the mobile web. In particular, three key data sources for their analysis were England’s North West Ambulance Service, Foursquare user check-ins, and the UK government’s Index of Multiple Deprivation (IMD) which identifies areas affected by lower quality of life.

The researchers also improved their predictive capabilities by creating a new variable — daytime population — to address daily fluctuation in the number of people in one area. By estimating daytime population (workplace population plus residents younger than 16 and older than 74), the researchers were able to better predict ambulance calls than if they had simply used residential population statistics.

Through correlation-driven data analysis, the researchers sought to determine whether certain types of ambulance calls are associated with different demographic and socio-economic conditions. They found that daytime population correlates with unconscious/fainting incidents, seizures, and falls; IMD correlates with breathing problems, chest pain, and psychiatric/suicide incidents; and dense urban centers correlate with unconscious/fainting incidents.

With an understanding of these correlations, the researchers were able to generate a model for predicting both the number and type of ambulance calls in a local area. Though they are encouraged by their results, they recognize the limitations of data from social media due to potential biases. For future research, Noulas and Gonçalves suggest using “real time digital datasets from location-based services to model medical incident activity not only across geographies, but also over time.” They also hope that this type of research can soon begin to inform relevant public health policies.

By Paul Oliver