Anyone can publish on Medium per our Policies, but we don’t fact-check every story. For more info about the coronavirus, see cdc.gov.

Where do people go during a pandemic?

Wayne Winski
Aug 14 · 4 min read

Identifying non-commercial visitation

Image for post
Image for post
Visitation to National Parks and Forests

The What and Why

For an advertising company, it is extremely valuable to understand where people are spending their time in order to categorize them or map individual interests to products or brands. However, during the COVID-19 pandemic governments began commercial shutdowns of shops, restaurants, theaters, etc. Thereafter, people have shifted their movement and visitation from commercial locations to outdoor lifestyle and hobby activities, such as beach-going, hiking, and camping. Adapting to these shifts in behavior, my intern project this Summer at Xandr has been focused on identifying these non-commercial visitations through consumer insights available on mobile devices.

Implementation and Techniques

The volume of GPS data being gathered is massive, billions of records per day, and from this jumble of data there is a substantial subset which is errored or irrelevant in solving our problem. The goal of the data cleaning process is to remove this unnecessary information by identifying, for example, devices with too few data points, instances of low accuracy, or excessive speed (driving or flying).

With the data cleaned and in the proper format, the next step is to identify areas where a device has multiple GPS pings densely packed, which represents an activity with some amount of dwell time. To accomplish this, I settled on the DBSCAN algorithm, which supports geospatial clustering, does not require the number of clusters as an input, and can identify non-circular clusters. This allows us to identify geographic locations where people are (relatively) stationary and filter out noise, or non-clustered data (Figure 1).

Image for post
Image for post
Figure 1 — DBSCAN clustering of GPS pings

Given spatial clusters, we can identify where people are spending their time — but not for how long or how often. Sequencing is the process of separating spatial clusters temporally. See Figure 2 below for an illustration.

Image for post
Image for post
Figure 2 — Separating cluster temporally through sequencing

Once spatial-temporal clusters are identified, we need to identify whether or not a device actually visits a non-commercial location. Here I identify the cluster(s) within the polygon of interest and derive a score to represent our confidence that the given cluster(s) represent a visit. In order to create this confidence score I am considering how far into the polygon each cluster is and, for those that are close to the border, calculating the local centrality (Figure 3) of each cluster.

Image for post
Image for post
Figure 3 — Local centrality measurement using intersections in cardinal direct

Personal value / growth

My internship this Summer has taught me more than I could have hoped for. Leading up to the program I was concerned about working from home, but I’ve come to enjoy the setup. While socializing is more difficult in a virtual setting, everyone I have spoken with has adapted positively, and the benefits of not commuting can’t be understated. In terms of career growth this program has surpassed my expectations and left my previous experiences in the dust. I’ve been able to brush up on existing technical skills and learn some new ones. I spent a lot of time doing data analysis and manipulation to better understand the dataset and its relationship to the problem I was solving. I learned how to handle data at scale by using distributed computing frameworks, such as PySpark and EKS nodes. I had to expand my knowledge on the theory of clustering and design my own solutions rather than just applying off-the-shelf models. Aside from the technical skills I’ve also had many opportunities to learn about Xandr and the advertising industry as a whole and to network with people from various domains and backgrounds. I have improved my public speaking and presentation skills, especially in a business setting. The breadth of experiences you have access to as a Xandr intern is really up to you, how much do you want to get out of the program?

About the Author

Wayne is a Graduate student at The University of Texas at Dallas focusing on Data Science and Intelligent Systems. Outside of work he enjoys spending time with friends playing board games, playing soccer, and camping.

Xandr-Tech

Our latest thoughts, challenges, triumphs, try-again’s…

Wayne Winski

Written by

Graduate student and aspiring data scientist

Xandr-Tech

Our latest thoughts, challenges, triumphs, try-again’s, most snarky and profound commit messages. Our proudest achievements, deepest darkest technical debt regrets (just kidding, maybe). All the humbling yet informative things you learn when you try to do things with computers.

Wayne Winski

Written by

Graduate student and aspiring data scientist

Xandr-Tech

Our latest thoughts, challenges, triumphs, try-again’s, most snarky and profound commit messages. Our proudest achievements, deepest darkest technical debt regrets (just kidding, maybe). All the humbling yet informative things you learn when you try to do things with computers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store