why (geolocation) mobility data?

4 min readJun 8, 2018

I’m definitely not the first to state that geographic data coming from mobile phones are an amazing resource for many purposes. Some even suggest that this kind of data will, in the near future,
substitute census data collection such as land use and origin destination (OD), as well as other types of demographic data. Using sensors and wearable devices to collect data, aren’t only cheaper to collect and acquire, they are also expected to result in a more accurate and representative data. This is in part because such data (e.i. geolocation) is generated regularly and isn’t dependent in self-reporting. Much of the work that has been done in this field derives from use Call Detail Record (CDR), which is basically data generated from phones when some interaction has been initiated. The CDR data has already been applied on fields like health and transportation in order to detect patterns and change over time. And while CDR data usually comes at immense amounts it is not available to all as it is proprietary data. Additionally, many of the individuals are not aware that their data is being used, since phone companies own and trade it. It is also rather biased, due to its nature — it is recorded only when a person interacts with their phone, which leads to different people interacting differently with their devices.

A free of charge, more granular, and less biased type of data is geolocation data captured by smartphones. Google Location data comes from the Google Takeout service is one of the easiest to acquire. It is available for several years now, and yet it is only recently starting to gain power within researchers. Google “allows” users to access their historic data and download all their location history in seconds (alternatively: we allow Google to access our data? but let’s keep this topic for another blog post :) ). One of the least known facts about this data is that for many smartphone users, these data are being collected as a default, and don’t require any setting changes. In fact, much of Android’s users are not aware that their Google account might contain years of their data recorded. It is also only acquired (by researchers) when a person download and shareֿ it themselves.

I first started working with Google Takeout a little more than one year ago and ever since I am constantly tracking myself (excluding some days where I turn it off to reduce battery consumption).

For a few months, and before I dived into data conversion and cleaning I interacted with my collected locations through the Google Timeline interface, available via the desktop Google Maps UI. This interface displays modes of transports, as well as, places visited pretty accurately. When I finally downloaded my data and started wrangling it I learned that the data provided to users by Google is a much more raw version of what is displayed through their interface. Basically, it is a list of coordinates, timestamps and accelerometer recordings. Somewhere, late in winter of 2017, I started cleaning and writing a script that will be reproducible and accessible to conduct some cleaning, analysis, and visualizations. This process has revealed (and still does) so many layers and opportunities to look for meaning and make sense of people, places and time (in this case, of myself!).

Here is one of the first visualizations I produced using my own 8 months of data. Before dumping it into CARTO I stratified days of the week (white for weekends and black for weekdays) using iPython Jupyter Notebook. My data points are plotted as a time series (you can see the lower bar indicating time) on top of NYC’s Subway lines.

Although it is only a “simple” visualization and not a statistical analysis or modeling the visualization reveals how important this data can be to understand people moving in the city. I envision this data can assist in learning about people activities and modes of transport. It can also be used to evaluate use of areas in the city and to compare between neighborhoods and cities.

What I like best about this animation is that it makes it so easy to detect the specific Subway lines I used, especially when commuting from my home (Park Slope) to work (Washington Sq). When staring at the Manhattan bridge for a few seconds it becomes clear that I mostly commute using the orange and yellow routes — B appears as the most dominant Subway line (gotta LOVE Broadway Lafayette’s morning commute human crowds!) .

why (geolocation) mobility data?

Written by avigailvantu