Chronic Disease Prediction Using Urban Data & Machine Learning

NYU Center for Data Science
Center for Data Science
2 min readFeb 25, 2019

Urban mobility data helps researchers link visits to certain locales with chronic diseases

Consider a day in your life. Where do you go? Do you visit to the gym, watch Netflix, cook, or frequent restaurants? Lifestyle choices such as these gradually affect overall health. In fact, lifestyle accounts for 70–90% of chronic diseases.

In a recent study, Anastasios Noulas, Moore-Sloan Data Science Fellow, and researchers from University of Science and Technology of China (Xing Xie, Enhong Chen, Yingzi Wang) and University of Cambridge (Xiao Zhou, Cecilia Mascolo, Yingzi Wang) used patterns of human movement to determine likelihood of developing a chronic disease. Put simply, researchers studied where people go (treated here as “check-ins”), and how that translates to their health status. Mobility data can provide clues to a lifestyle. For example, college students may be people who regularly visit lecture halls, gyms, and libraries, where white-collar employee could be people more likely to dine out and spend time in offices. Researchers treat various “lifestyles” as categories with distinct check-in patterns.

Researchers used large-scale human mobility data from the location technology platform Foursquare, datasets from UK government websites concerning the characteristics of the 630 wards of London, and a disease dataset also from a UK government website. From these, the researchers projected a collection of lifestyles categories and explored correlations to 20 chronic diseases. Figure 1 (a) provides visualizations of particular correlations to disease based on POIs, and (b) where POIs are clustered in the London metropolitan area.

They adopted the Gaussian Mixture Method (GMM) to group suitably related POIs, a method of clustering. Researchers considered topic models common to natural language processing, such as Latent Dirichlet Allocation (LDA) to group POIs. In this view, regions would be treated as “documents” and check-in patterns as “words.” However, because researchers required an extrapolation to account for sparser data, they adopted a collaborative topic model (CTM). Researchers found that this combination of CTM and GMM showed improvements in prediction tasks compared to baselines models.

Results revealed correlations consistent with medical research, and discovered additional correlations requiring further study. The table below lists check-in lifestyles and their correlated chronic diseases.

By Sabrina de Silva

--

--

NYU Center for Data Science
Center for Data Science

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.