Where are you going next? A new population movement predictor is on the case

NYU students work with Rumi Chunara, a CDS affiliate & an Assistant Professor of Computer Science and Engineering and Global Public Health, to devise a new way for predicting individual human mobility timelines using Twitter data

Predictions of human movement patterns can improve transportation management, urban forecasting and development, traffic congestion, and understanding the spread of disease. So far, many data science projects have sought to either predict population movement patterns as a whole or a person’s next location based on a current given location, often using GPS or call data record tracking.

But Rumi Chunara, a CDS Affiliated Professor, is taking a different approach. Along with two NYU doctoral students, Nabeel Abdur Rehman and Kunal Relia, Chunara has devised a new algorithm for predicting individual mobility patterns that uses publicly available geo-located Twitter data.

While other researchers have used social media as a dataset before, (such as Moore-Sloan Fellow Anastasios Noulas), Chunara and her collaborators have proposed a new type of mobility prediction called Intermediate Location Computing, which can predict the entire scope of an individual user’s mobility timeline with a high degree of accuracy from sparse social media data.

For their dataset, the researchers used data from six months of publicly available geo-located Tweets from the Twitter API. The data ranges between January 1st — June 30th 2014 and comes from users in New York, Washington, D.C. and San Francisco. This originally yielded over twenty-million Tweets, but this dataset was filtered to exclude accounts with unfeasible location movement, non-personal accounts, accounts using location spoof software, and accounts that tweeted fewer than four times in the six month period. The inclusion criteria limited the dataset to tens of thousands of Tweets which were adjusted for time zone shifts and daylight savings.

The researchers uniquely address location and time, two fundamental components of predicting mobility. To define location, the researchers divided the area of each city into segments of one square mile. They further subdivided each area into 0.5 square miles and 0.1 square miles, but, intuitively, the algorithm performed best when attuned to larger sections. For time intervals, given the sparse nature of data, the researchers chose one and two hour intervals rather than shorter fifteen or thirty minute intervals which have been used in other research.

With this set of metrics for location and time, the algorithm can predict individual user location with up to 86% accuracy — almost 20% better than compared methods. To make accurate predictions, the algorithm incorporates both personal behavior to identify a home and work location (based on next location, previous location, and historical locations) and community behavior, which especially improves accuracy on the weekend when users have a lower likelihood of being at home or work.

Chunara and collaborators envision many real-world applications for their research. In particular, they hope that mobility patterns derived using Intermediate Location Computing, coupled with additional information about individuals from social media data, will enhance disease transmission modeling at local levels.

By Paul Oliver