Patterns of Denver Car Thefts and Nearby Population

Matt Dawidowicz
CodeX
Published in
6 min readJul 29, 2021
Photo by Colin Lloyd on Unsplash

Car thefts happen very often all over the world, but figuring out correct patterns is difficult without extensive study. One assumption that seems intuitive is that car thefts happen in areas with less foot traffic at the time of the event. It makes sense: most people planning to commit theft would consider a location where the number of witnesses and people able to stop them is at a minimum. Stealing a car on a crowded thoroughfare on a busy night would seem absurdly foolish.

For our analysis, we will be using Safegraph hybrid POI-Patterns data as well as data from the city of Denver. SafeGraph is a data provider that provides point-of-interest and foot traffic data for thousands of businesses and categories. It provides data for free, as long as you sign up as an academic. The schema for the Patterns data can be found here.

A big reason we are focusing on Denver is:

  • It is a major, car-centric urban area in the United States.
  • Denver provides all reported crime data, with complete time of crime and geographic coordinates, on their government website.

For links of the Colab notebooks involved (one for file conversion, and another for the analysis), please click on these links.

Setup

The process of imported packages and reading the data is all within the file conversion notebook above. Refer to the comments and descriptions there for information on that topic.

Below is a sample of the crime data from Denver’s government being used, filtered down to auto thefts and columns of interest, along with several columns edited to make these columns:

Column descriptions:

  • INCIDENT_ID: A unique key denoting the specific incident.
  • INCIDENT_DATE: Day of the incident.
  • INCIDENT_TIME: Time of the incident.
  • INCIDENT_HOUR: Hour of day of the incident (important for determining population of area at that time).
  • INCIDENT_ADDRESS: Address location of the incident.
  • GEO_LAT: Latitude coordinate of the incident.
  • GEO_LON: Longitude coordinate of the incident.

Below is a sample of the Patterns Data:

Column descriptions:

  • placekey: Placekey ID of the POI. For more info, click this link.
  • location_name: Location of the POI.
  • latitude: Latitude coordinate of the POI.
  • longitude: longitude coordinate of the POI.
  • street_address: Address of the POI.
  • city: City of the POI.
  • region: Region of the POI (in this situation, state).
  • postal_code: Postal ZIP Code of the POI.
  • iso_country_code: Country code of POI.
  • date_range_start: Beginning date for the pattern data.
  • date_range_end: Ending date for the pattern data.
  • raw_visitor_counts: Total visitor count for the POI for the date range.
  • popularity_by_hour: List consisting of average number of visitors at each hour of the day.

Placekey Generation

The premise of the analysis is to derive the Placekeys of all the car thefts, which can be done via the Placekey API.

The API requires a unique key, which makes the above Incident ID perfect for those purposes.

Below is a sample of the input for the API to create the Placekeys:

After the API creates the Placekeys, here is a sample of the output:

Note: these Placekeys are “where-only”, which means meaning that they correspond not to a POI, but to an area; more specifically, an H3 hexagon with a diameter of approximately 130 meters. For more information, check out the FAQ.

With those Query IDs, we can inner join these new Placekey IDs to the crime data.

As you can see, we now have Placekey IDs for every incident, and with the Placekey IDs, we can now join THIS table to the POI Patterns table.

Note: this picture does not include all the columns.

With this information, we now have a list of all the POIs within the incident’s Placekey. However, there are two things we must do:

  1. Due to the join method, rows were created where an incident is joined with POI pattern data outside the month it occurred, which leads to duplicates and incorrect results. So those rows are identified and removed.
  2. We must isolate the number of visitors at each POI at the given hour of the incident, and then add them together to estimate the total POI occupancy within the Placekey hexagon at the time of the incident.

Two full tables were created, as both will be separately used to test the hypothesis:

  1. The previous table.
  2. One with all the Placekey hexagons in the POI Pattern dataset added. If none of the crimes took place within said hexagon, the other columns have no data in them.

Analysis

We will be calculating the correlation of the POI visitor occupancy for each Placekey hexagon across all the date ranges and the number of car thefts in each Placekey hexagon.

With the first table (those excluding the empty Placekey hexagons), the correlation is 0.696. This disagrees with the original hypothesis…the number of accidents INCREASES with population, not decreases. It makes some intuitive sense…the more people there are, the more likely there are to be people to steal cars. But other than that idea, it goes against most instincts.

We are comparing the last two columns.
As you can see, the number of incidents and the number of occupants are directly proportional.

With the second table (those includes the empty Placekey hexagons), the correlation is 0.725. That is even stronger than that of the first table. This provides further evidence against the hypothesis. A graph showing the relationship between incidents and occupants here is not provided, as the difference between the graph and the previous is infinitesimal.

We are comparing the last two columns, but this time a bunch of zeroes are added.

Conclusion

Based on the methodology used, the hypothesis seems false…the number of incidents does not go down in more crowded areas, it goes UP. This makes some sense, as populated areas have more cars to steal, but it goes against what you would intuitively expect for people to avoid witnesses.

We could analyze the population of the empty Placekey hexagons, but it’s highly unlikely another 2326 Placekey hexagons have occupancy so small that there are no car thefts, especially since the crimes seem evenly distributed throughout the city.

Our analysis is moving, but it is not foolproof. More work could be done to expand our understanding of the relationship between foot traffic and auto theft; this article just scratches the surface.

For example, perhaps our Placekey hexagons were too high a resolution of groupings. Perhaps grouping at the Census Block Group level will tell a different story.

For more information on the data being used, visit Safegraph’s website. Anyone is free to use the above data to expand on or verify these conclusions.

We’ll explore this in Part Two — -coming soon!

Questions?

I invite you to ask them in the #safegraphdata channel of the SafeGraph Community, a free Slack community for data enthusiasts. Receive support, share your work, or connect with others in the GIS community. Through the SafeGraph Community, academics have free access to data on over 7 million businesses in the USA, UK, and Canada.

--

--

Matt Dawidowicz
CodeX
Writer for

I am an aspiring data scientist and comedian, who loves analyzing data and everything it can tell us.