Crime Hotspots and How to Find Them with BigQuery
Crime Distribution in San Francisco
Map all the SFPD incidents since 2010 and the Tenderloin stands out as a place you probably want to be careful if you’re visiting The City.
I wondered how different types of crime are distributed around the City by the Bay — so I fired up BigQuery and used the SFPD Incidents public dataset to investigate.
By calculating the combined standard deviation of the latitude and longitude of each type of crime, we can find which crimes are the most heavily concentrated in a particular area — highlighting hotspots for specific crimes across the city.
SELECT
descript,
STDDEV(latitude) as sdLat,
STDDEV(longitude) as sdLong,
count(*) as count
FROM
`bigquery-public-data.san_francisco.sfpd_incidents`
WHERE
(Latitude IS NOT NULL) AND (Longitude IS NOT NULL)
AND (Latitude > 30) AND (LATITUDE < 40)
AND (Longitude > -130) AND (Longitude < -120)
GROUP BY
descript
HAVING
sdlat is not null and sdlong is not null
AND count > 500
ORDER BY
sdlat+sdLong ASC
The results with the lowest standard deviation represent crimes with the highest location-concentration; the top 10 are shown in the table below. Note that drugs and prostitution make up half the top 10.
In total, 75% of the drug-related crimes in the categories from the top 10 table above occurred in this part of San Francisco shown in the map below:
Prostitution is also tightly concentrated, but in this case most police incidents occur in two separate, but distinct, areas:
Crimes with the highest location standard deviation — or greatest spread — are shown in the table below, with burglary, theft, and found property constituting 7 of the top 10.
These crimes tend to occur all over the city; as you can see in the map below that shows locations with at least one burglary or attempted break-in:
Looking at the most common burglary-related incident type, residential break-ins, 74% of incidents occurred in a location with only one break-in. The map below shows only locations with 5 or more burglary break-ins:
BigQuery includes many more public datasets for San Francisco, as well as other cities including New York and Chicago. What can you discover about the cities we live in?
If you’re new to BigQuery follow these getting started instructions, and remember that everyone gets 1TB and 10 GB of storage at no charge every month to run queries.
Share your investigations with us at reddit.com/r/bigquery and subscribe to Today I Learned with BigQuery for more BigQuery public dataset investigations.