Crime Data Analysis for Baltimore, MD

Levi Raichik
Analytics Vidhya
Published in
5 min readOct 31, 2019
Photo by Matt Popovich on Unsplash

I read about a great study done on the question, does having more police on the street reduce crime or not?

If you are interested in reading it then you can do so by clicking here.

The really cool thing about it is how they found an area where there was an increased police presence that had nothing to do with crime going up. This was done by checking crime in the Washington, DC area during the time when terror alerts went up and therefore there was more law enforcement in the area. They found about an overall 15% decrease in crimes that usually occur outdoors.

I decided to try and take this conclusion to what I think is it’s next logical step. How to decrease crime without needing to hire more officers. I wanted to try and predict areas where crime is high at certain times of the year, or the day. That would then tell law enforcement when to deploy officers, to which areas, in order to cause the biggest drop in crime with the current resources the police have.

I chose to focus on crime data from Baltimore, MD. I obtained it from their open data website, and can be seen by clicking here.

Cleaning the data

My first step was to clean the data, I had to account for missing values, fill in location based on longitude and latitude and much more.
I had a great time doing this and if you would like to see what exactly I did, or use it yourself to clean this data, then you can check my Jupyter notebook on cleaning this data by clicking here.

EDA (Exploratory Data Analysis)

Once I cleaned up the data, my next step was to check it out further and analyze it in order to better understand it.

I found very little in term of citywide trends in crime. There is a drop in February for seasonality, and there may be a multi-year trend. However, as the data I had only had continuous data from 2014 and on, I was unable to determine what type of trend that would be. You can see that in this image:

Zeroing in on one area

I decided to zero in on the downtown neighborhood area as that had the highest crime overall and in general in most 3 hour time slots during the day.

Over here you can see a drop in crime from November through February, as well as a possible multi-year trend as well:

I then tried to find reasons for spikes in crime by month by looking at specific months when crime was up. Here is the plot for crime by month for this area:

Interestingly, there is not spike in crime for this area during the Baltimore riots in April-May of 2015.

I was not able to find specific events during months with spikes that seemed to be related to a spike in crime. For example, here is the plot for May 2018:

As you can see, and this was normal across the other months I looked at, crimes did not really go over 8 a day. This together with being unable to find specific events happening on those days which may be causing that spike, means that I was unable to come to a conclusion on what to look out for to then know when and where more law enforcement should be deployed.

The more I looked into this data, the more I came to the conclusion that I simply did not have enough information to be able to predict when or where crimes were occurring.

Modeling

I did try to predict which crime is occurring, based on the information I had, for the 5 neighborhoods with the most overall crime. I also tried to predict in which of these 5 neighborhoods the crimes were occurring to then be able to know where to deploy officers.
I used Random Forest for both of them and the accuracy was around 50% overall and less then that for certain crimes and areas.

Tableau Dashboard

I did make an interesting dashboard in Tableau, which which you can filter based on date, time, neighborhood, crime, and if a crime is occurring inside or outside to be able to find your own insights into this data. You can also click on the timeline to see the map for a specific date. Here is what the dashboard looks like:

Feel free to check it out by clicking here.

Conclusion

With the data I had or was able to look up and find, I was unable to predict the area or time a crime would occur. I do believe that further work into this with more data about crime, may be able to get a good enough prediction to help deploy law enforcement to be able to reduce crimes that occur outside. Unfortunately, it was not something I was currently able to do with the data I had.

The lesson I took from this project, which seems to be an important one for all Data Scientists to learn is that we can not always use the data we have in the ways we thought we would be able to. Sometimes we just need to say, we need more information to be able to do what we would like to.

I do not see this as a failure, but rather as an important learning experience for me as I continue to grow as a Data Scientist.

Please feel free to comment below, and you can checkout my code for this project in GitHub by clicking here.

--

--

Levi Raichik
Analytics Vidhya

Data scientist and machine learning engineer with a passion for turning raw data into useful, actionable insights. https://www.linkedin.com/in/levi-raichik/