Crime Hotspots and How to Find Them with BigQuery

Crime Distribution in San Francisco

Reto Meier
Google Cloud - Community

--

Map all the SFPD incidents since 2010 and the Tenderloin stands out as a place you probably want to be careful if you’re visiting The City.

I wondered how different types of crime are distributed around the City by the Bay — so I fired up BigQuery and used the SFPD Incidents public dataset to investigate.

Heatmap based on number of SFPD incidents by location

By calculating the combined standard deviation of the latitude and longitude of each type of crime, we can find which crimes are the most heavily concentrated in a particular area — highlighting hotspots for specific crimes across the city.

SELECT
descript,
STDDEV(latitude) as sdLat,
STDDEV(longitude) as sdLong,
count(*) as count
FROM
`bigquery-public-data.san_francisco.sfpd_incidents`
WHERE
(Latitude IS NOT NULL) AND (Longitude IS NOT NULL)
AND (Latitude > 30) AND (LATITUDE < 40)
AND (Longitude > -130) AND (Longitude < -120)
GROUP BY
descript
HAVING
sdlat is not null and sdlong is not null
AND count > 500
ORDER BY
sdlat+sdLong ASC

The results with the lowest standard deviation represent crimes with the highest location-concentration; the top 10 are shown in the table below. Note that drugs and prostitution make up half the top 10.

SFPD criminal incident descriptions with over 500 occurrences, and the lowest location standard deviation.

In total, 75% of the drug-related crimes in the categories from the top 10 table above occurred in this part of San Francisco shown in the map below:

Map showing 75% of SFPD incidents for drug-related crimes in categories within the top 10 most concentrated crime types. The Tenderloin is outlined in red.

Prostitution is also tightly concentrated, but in this case most police incidents occur in two separate, but distinct, areas:

Heat-map of prostitution-related crimes in categories within the top 10 most concentrated crime types.

Crimes with the highest location standard deviation — or greatest spread — are shown in the table below, with burglary, theft, and found property constituting 7 of the top 10.

SFPD criminal incidents with over 500 occurrences and the highest location standard deviation.

These crimes tend to occur all over the city; as you can see in the map below that shows locations with at least one burglary or attempted break-in:

All locations with one or more burglary / attempted break-in across San Francisco.

Looking at the most common burglary-related incident type, residential break-ins, 74% of incidents occurred in a location with only one break-in. The map below shows only locations with 5 or more burglary break-ins:

Locations with five or more burglary break-in across San Francisco.

BigQuery includes many more public datasets for San Francisco, as well as other cities including New York and Chicago. What can you discover about the cities we live in?

If you’re new to BigQuery follow these getting started instructions, and remember that everyone gets 1TB and 10 GB of storage at no charge every month to run queries.

Share your investigations with us at reddit.com/r/bigquery and subscribe to Today I Learned with BigQuery for more BigQuery public dataset investigations.

--

--

Reto Meier
Google Cloud - Community

Developer Advocate @ Google, software engineer, and author of “Professional Android” series from Wrox. All opinions are my own.