Predicting Crime in Baltimore

A map of Baltimore’s neighborhoods and their murder rates

Inspired by a rewatching of The Wire, I decided to look at crime data for Baltimore, Maryland. Baltimore has also been in the news recently for its high murder rate, with headlines like “Baltimore murder rate worse than Honduras, El Salvador, Guatemala”, so I took a specific look at rates and locations of murders as well.

(Source: Princeton Policy Advisors)

As an aside, note that this headline and chart are rather misleading. Comparing the homicide rates of densely populated urban environments like Baltimore to entire countries is not particularly informative, since rural environments tend to have lower murder rates and will drive down the average rate. For a closer comparison: San Salvador, El Salvador’s capitol city, has a murder rate of 59.06 (per year per 100,000 people), significantly higher than that of the country as a whole. While it is true that Baltimore has a higher murder rate than both Honduras’s San Pedro Sula (51.18) and Guatemala City (53.49), which are each country’s most murderous cities, the difference is not nearly as extreme.

The Data

For this project I used Baltimore’s public victim-based crime data, in which over 250,000 crimes are reported. The data predominantly covers the years after 2012, and the data is updated on a weekly basis. In order to focus on predictive ability of the model I was using, I split the data based on year for training and testing. Everything before 2018 I used to train the classifier and everything else was used for testing.

The Goal

I initially attempted to predict what type (homicide, burglary, etc.) a specific crime was based on various criteria. Using a random forest classifier I was only able to increase the accuracy above the majority classifier baseline by about 5%. Guessing that every crime was larceny would have yielded an accuracy of 22.3%, while the classifier managed to achieve 27.8% accuracy.

When I did my first predictions, I was excited to see an accuracy score of almost double the baseline, but realized that I had included columns that provided information about the specific location of the crime that was committed. These descriptions are definitionally connected to crimes such as home robbery, so using them for prediction purposes would not be particularly informative. Adding ‘premise’ back into the model, the accuracy shoots up to 37.3%. Surprisingly, adding demographic data for neighborhoods actually caused the score to worsen.

Time Predicts Crime

No matter what features or classifier you use, the time of the crime is always the most important feature for predicting its type.

Unsurprisingly, the rate of crime also fluctuated with the time of day, as shown below. Each circle represents the rate of crime for a neighborhood of Baltimore across hours of the day.

Crime rate by hour of the day in Baltimore.

Crime rates remain low from around 2–6AM and rise throughout the day until a slight dip starting around 6PM. Downtown Baltimore is an especially prevalent area for crime throughout most of the day, but at certain times is outpaced by some neighborhoods in West Baltimore.


Let’s return to Baltimore’s infamous murder rate. Of the 277 neighborhoods listed in the dataset, just 25 are responsible for over half of all murders. While downtown Baltimore has the greatest proportion of crime overall, it has a relatively low murder rate (3 in 2018). Meanwhile, Allendale, Sandtown-Winchester, (both in West Baltimore), Upton (the location of “The Pit” in The Wire, also West Baltimore) and Brooklyn (South Baltimore) all had 10 or more murders in 2018.

Baltimore murders in 2018. Larger circles indicate neighborhoods with more murders.

While the East Baltimore neighborhoods of Belair-Edison and Frankford are high-crime areas, their murder rate is lower than many neighborhoods in West Baltimore.

As should be expected, crime is very difficult to predict at any fine level of detail. To the degree that it is predictable (certain neighborhoods have high crime rates, and time of day plays a significant role), these factors are fairly easy to recognize without scientific analysis.

If you’re a fan of The Wire, check out this awesome map of event locations to see the similarities and differences between the real and fictional Baltimores.

If you’re not a fan of The Wire, you must not have watched it yet!