SafeGraph Partners With AWS and Databricks To Launch Industry’s First Full-Stack Location Solution

SafeGraph
SafeGraph
Published in
8 min readNov 14, 2019

SafeGraph is thrilled to announce an exciting partnership with AWS and Databricks to make insights about the physical world easier than ever.

Today Amazon launches AWS Data Exchange , a new platform for sharing data. SafeGraph is honored to be a founding data partner for the AWS Data Exchange, launching today with over 20 powerful datasets available for free or for purchase (you need to sign in to your AWS account to see the listings).

SafeGraph is honored to be a founding data partner for the AWS Data Exchange, launching today with over 20 powerful datasets available for free or for purchase.

To show the power of SafeGraph data from AWS Data Exchange inside Databricks, we’ve created this Databricks demonstration notebook (.dbc download here). For ready-to-run code, please see the complementary Databricks notebook.

Also, if you want to learn more about using SafeGraph data in Databricks, register for our upcoming webinar.

SafeGraph is the source of truth for points-of-interest (POI) data, business listings, and visitor foot-traffic insights

SafeGraph is just a data company, that’s all we do.

SafeGraph has three primary datasets:

  • Core Places: Base information about a point of interest (POI) such as location name, address and brand association for top ~5,500 national brands. Available for ~6.1MM POI.
  • Geometry: Geometry information for commercial POIs that includes the polygon of the POI and spatial hierarchy metadata defining whether the polygon is contained within another POI. Available for ~6.1MM POI.
  • Patterns: Place traffic and demographic aggregations that answer: how often people visit, where they came from, where else they go, and more. Available for ~3.6MM POI.

AWS is one of the most important cloud services companies in the world. Making SafeGraph data available in the AWS Data Exchange is 100% aligned with the SafeGraph mission to democratize access to data.

The AWS Data Exchange is now hosting over 20 datasets from SafeGraph, including:

  1. SafeGraph Core Places — Entire USA (5.3MM records)
  2. SafeGraph Foot Traffic Patterns (2019) — Entire USA (3.6MM records)
  3. SafeGraph Foot Traffic Patterns (2019) — USA Restaurants (671k records)
  4. SafeGraph Core Places — USA Gas Stations and Convenience Stores (135k records)
  5. Free Data Sample: SafeGraph Foot Traffic Patterns — USA Starbucks (11k records)

and many more.

How do I work with SafeGraph data from AWS? Answer: Databricks.

Databricks is a unified analytics platform that enables data science, data engineering and business analytics teams to derive value from data at scale and with ease of use in a collaborative manner.

At its core, the Databricks platform is powered by Apache Spark and Delta Lake in a cloud native architecture, which gives users virtually unlimited horse power to acquire, clean, transform, combine and analyze data sets within minutes from a notebook interface, with popular languages of choice (python, scala, SQL, R).

Because Databricks is a managed platform, customers do not have to become big data devops gurus to power their analytical needs, which reduces administrative burden, costs and risks of their data driven projects.

Learn how to analyze SafeGraph data in Databricks

We’ve created this Databricks notebook (.dbc download here), and published this blog, so that you can hit the ground running using SafeGraph Data from AWS Data Exchange in Databricks. For ready-to-run code, please see the complementary Databricks notebook. For detailed instructions on setting up Databricks and loading in SafeGraph data, see the Databricks sister blog post.

How do we load SafeGraph Patterns from AWS Exchange into Databricks Data Lake?

To demonstrate the power of SafeGraph data inside Databricks, we are highlighting three datasets from SafeGraph currently available for free inside AWS Data Exchange.

  1. SafeGraph Patterns — Starbucks in the USA (Free)
  2. SafeGraph Core Places — Starbucks in the USA (Free)
  3. SafeGraph Open Census Data (Free)

Getting your data running in Databricks is just a few clicks away.

We’ve published full step by step instructions for loading SafeGraph data into Databricks from AWS Data Exchange on the Databricks blog.

What can I learn about consumer behavior using SafeGraph data in Databricks?

Once you have SafeGraph data loaded into Databricks, many exciting answers about consumer behavior are at your fingertips.

To see these implemented in code, checkout the accompanying Databricks demonstration notebook.

What time of day do people visit Starbucks?

With a few lines of code you can examine the relative popularity of individual locations of Starbucks, as well as the average popularity by hour across Starbucks nation-wide.

Figure 1. The x-axis shows each hour of the day (local time) from midnight (0) to 11pm (23). The y-axis reflects how many visits are happening at each hour, summed across all the days of the month, as a percent of total visits of the entire month (Note: visits that cross hour boundaries will be counted in multiple hours. Therefore, the total % across all hours may add up to > 100%.). Each safegraph_place_id is a unique Starbucks location.
Figure 2. Same as Figure 1, but averaged across all Starbucks nation-wide.

The data shows that traffic ramps up during the morning, and peak traffic is around 12pm and 1pm.

What days of the week do people visit Starbucks?

We can ask the same question but about what days of the week are popular.

Looking at 20 random Starbucks examples we see that on average no days are strongly preferred over others. However, some POI do show interesting weekend vs weekday differences.

Figure 3. Comparing foot-traffic by day for 20 random Starbucks locations.

We can examine one of these POI and compare it to the national average.

Figure 4. Comparing foot-traffic by day for a particular Starbucks vs National Average.

This data shows that, nationally, the busiest days of the week at Starbucks are Wednesdays and Thursdays, although this is a mild preference. In contrast, safegraph_place_id sg:68513387500e48eb87d719207d058309 shows a very different pattern and is significantly less popular during the weekends compared to weekdays.

To visualize where this POI is located, you can read the (latitude, longitude) from the SafeGraph dataset and search for it in Google Maps. It turns out that this particular Starbucks is located on the campus of the Boston University School of Law. Presumably the fact that classes are not held during weekends is causing this very large weekday vs weekend difference.

How far do people travel to visit Starbucks?

SafeGraph reports the median distance travelled (from the home census block group) for each POI. Using this we can construct a histogram of Starbucks locations, showing how far people travel to visit Starbucks.

Figure 5. Histogram of median distance travelled from home for ~ 11,000 Starbucks locations.

This data shows that most Starbucks locations draw visitors that live less than 10 kilometers away. However there is a long thin tail of Starbucks locations with the median distance from home is hundreds of km. These locations are likely in high-tourist or high-commute areas (like in an airport) where most visitors do not live geographically nearby.

What are the cross-shopping preferences of Starbucks customers?

The column related_same_month_brand and related_same_day_brand reports an index of how frequently visitors to a POI visit also visit other brands (relative to the average visitor rate to that brand).

Here we look at what other brands are frequently visited by customers of Starbucks. The larger the index, the more frequently Starbucks customers visit that brand.

Figure 6. Top 5 cross-shopping brands for Starbucks Customers in California (CA), New York (NY), and Texas (TX).

Although Starbucks is a national chain, cross-brand shopping is highly influenced by local geography. Here we show the top 5 top cross-shopping brands for Starbucks customers in California, New York, and Texas. Only McDonald’s is in the Top 5 of all 3 states.

Analyzing a Brand’s Customer Demographics

You can use SafeGraph data from AWS Data Exchange in Databricks to analyze the customer demographics of individual POI or brands. For a deep dive on the methodology, along with more complete statistical analysis feel free to read this workbook.

Here we analyze Starbucks Customer Demographics along the Race Demographic dimension using available from SafeGraph in AWS Data Exchange.

This analysis could be repeated for any demographic information tracked by the Census, and reported at the census block group level. That includes Ethnicity, Educational Attainment, Household Income, and much, much more.

To do this analysis we will use:

  • Census data (from Open Census Data)
  • SafeGraph Patterns data, specifically the visitor_home_cbgs column
  • SafeGraph Panel Overview data
Figure 7. The y-axis shows the % of total visitors to Starbucks (left)or in the USA population (right)for each demographic segment.

The baseline demographics of the United States are shown as a reference. SafeGraph Patterns shows interesting differences between the census area demographics of Starbucks Customers compared to the overall USA population

  • SafeGraph Patterns data shows that on average, the home census block groups (CBGs) of Starbucks customers are 78.4% White, whereas the USA population is only 73.3% White. In other words, the home census areas of Starbucks customers are a larger fraction White than the US population.
  • The home CBGs of Starbucks customers are a larger fraction Asian, compared to the USA population.
  • The home CBGs of Starbucks customers are a smaller fraction Black or African American compared to the overall USA average.

Importantly, these differences are not due to geographic sampling bias in the SafeGraph dataset. It is true that the SafeGraph dataset has some small geographic biases. For a full report see “What about bias in the SafeGraph dataset?”. However, we are able to measure and correct the small effects of sampling bias in the SafeGraph dataset. For details on this calculation, see the Databricks demonstration notebook. For a thorough discussion on this methodology, see A Workbook to Analyze Demographic Profiles from SafeGraph Patterns Data.

SafeGraph + AWS + Databricks

  • Reading SafeGraph data from AWS Data Exchange into Databricks is quick and easy.
  • Combining these technologies and datasets enables you to answer powerful and precise questions about consumer behavior.

Thanks for reading! If you found this useful please upvote or share with a friend.

Want to get more SafeGraph data?

  • There are over 20 datasets available for free or for purchase in AWS Data Exchange. Check them out!
  • And you can download CSVs for data on over 6MM points-of-interest at the SafeGraph Data Bar. Use coupon code SafeGraphAWSDatabricksNotebook for $200 of free data.

Special thanks to Andrew Hutchinson and Prasad Kona from Databricks and Ryan Fox Squire from SafeGraph for help developing the demonstration notebook and content of this blog post.

--

--