Crime in Maryland: Where Should the Governor Allocate Support?

Luke Walsh
INST414: Data Science Techniques
7 min readMar 8, 2024

The question I am answering with this analysis is what counties in the state of Maryland need the most support dealing with crime? The stakeholder asking this is the state’s new governor, who is trying to decide how to give out support to the counties who need it. With this data, the governor and his office should be able to pinpoint specific counties that may have higher rates of crime and need help. This could either be financial help or other means to help reduce crime rates. If specific counties have abnormally high crime rates, then the governor could focus on them more to try to lower these rates rather than spending time and money in counties that don’t need it.

The data needed to answer this question is overall crime rates or the number of crimes committed in each county in the state of Maryland. With these fields, we should be able to find out how much crime is happening in the state’s counties. To get a subset of this data, I went to the state of Maryland’s open data portal. Here, I was able to find a dataset that had violent crime and property data by municipality since the year 2000. This information is beneficial because it gives me both the total number of crimes from each county, jurisdiction, and their crime rates per 100,000 people. It also splits this data into different types of crime. This ranges from murder, motor vehicle crime, to several other specific types of crime. This is relevant to my question because in order to tell which counties have high crime rates, we need actual numbers to be able to analyze and map.

The dataset that I found gave me more than enough information to be able to answer the given question. From this dataset, the information that I decided to focus on were the crime rates per 100,000. Overall crime totals are important, but the state has a wide range of populations, so the number of crimes could be skewed because of a county’s large or small size. Putting all of the crimes on a scale of per 100,000 people helps create an even playing field for all of the counties.

The first thing I needed to do when cleaning this dataset was to get rid of rows that weren’t needed. As stated before, I just wanted to look at the rates of crime per 100,000, so I had to drop all of the rows that didn’t have these numbers. The rows that were dropped included the overall crime numbers and percent change for each of the different types of crime. After dropping all of these rows, I had a much more manageable dataset.

import pandas as pd
import numpy as np
import plotly.express as px
import geopandas as gpd
import matplotlib.pyplot as plt

#load in the dataset
crime_df = pd.read_csv('maryland_violent_crime.csv')

#drop rows that aren't needed
crime_df = crime_df.drop(columns={"VIOLENT CRIME RATE PERCENT CHANGE PER 100,000 PEOPLE",
"PROPERTY CRIME RATE PERCENT CHANGE PER 100,000 PEOPLE",
"POPULATION", "MURDER", "RAPE", "ROBBERY", "AGG. ASSAULT", "B & E",
"LARCENY THEFT", "M/V THEFT", "GRAND TOTAL", "PERCENT CHANGE",
"VIOLENT CRIME TOTAL", "VIOLENT CRIME PERCENT", "VIOLENT CRIME PERCENT CHANGE",
"PROPERTY CRIME TOTALS", "PROPERTY CRIME PERCENT",
"PROPERTY CRIME PERCENT CHANGE", "OVERALL PERCENT CHANGE PER 100,000 PEOPLE"}

After this, I had the columns I needed, but the names of the columns needed to be renamed. Doing this would make them a lot easier to use and code with. The columns had long names and included spaces that made them hard to work with. So I renamed them to follow the following structure: “crimetype_per_100k”.

#rename rows to make them more useable
crime_df = crime_df.rename(columns = {'COUNTY': 'name',
'OVERALL CRIME RATE PER 100,000 PEOPLE': 'overall_per_100k',
'VIOLENT CRIME RATE PER 100,000 PEOPLE': 'violent_per_100k',
'PROPERTY CRIME RATE PER 100,000 PEOPLE': 'property_per_100k',
'MURDER PER 100,000 PEOPLE': 'murder_per_100k',
'RAPE PER 100,000 PEOPLE': 'rape_per_100k',
'ROBBERY PER 100,000 PEOPLE': 'robbery_per_100k',
'AGG. ASSAULT PER 100,000 PEOPLE': 'assault_per_100k',
'B & E PER 100,000 PEOPLE': 'b&e_per_100k',
'LARCENY THEFT PER 100,000 PEOPLE': 'larceny_per_100k',
'M/V THEFT PER 100,000 PEOPLE': 'm/v_per_100k'})

These changes made the dataset smaller and more code-able, but we still needed to narrow it down to fit exactly what we wanted. Although having the dataset go back to the year 2000 gave us more information, I wanted it to just go back to the year 2010 to give us a more modern view of what was happening in these counties. So I dropped all of the rows before the year 2010.

#find data for past the year 2010
crime_df = crime_df[crime_df['YEAR'] >= 2010]

The last thing I needed to do in the data cleaning process was combine jurisdictions based on a common county and find the average of those crimes. The dataset came split up into jurisdictions, but also included the counties, so I combined the jurisdictions based on county and found the mean of them. This gave me the average of all of the jurisdictions in the county, or the counties overall overage crime rate per 100,000 residents. This helped decrease the size of the file, but it also gave us a better overall view of each county.

#group by county and find the mean of all of the values
grouped_crime_df = crime_df.groupby('name').mean()

After looking through the data, I decided to focus on just a few different types of crime. Instead of studying all of the types of crime, I looked at the most prevalent types of crime in all counties. This left me with the overall crime rates, property crime, and larceny. What I decided to do first was create a heat map of Maryland showing each county and their overall crime rates. This allows for me to have an idea of the counties that may have high crime rates.

As seen in the map above, there are some interesting findings that can be observed. Higher population areas like Baltimore City County and Prince George’s County have higher rates than most others. But we also see very high rates in places like St. Mary’s County, Somerset, and especially Worcester County. This is surprising because these counties don’t have high populations compared to most other counties in the state. The biggest factor that could be driving crimes up in these areas is the influence of Ocean City, which these counties may be having trouble policing.

Let’s take a closer look at the two types of crime we are investigating in this study. First, let’s look at property crime. To assess which counties may need help, I filtered the dataset to give me the top ten counties based on the property crime rate column. I then took these values and put them into a bar graph.

From this graph we can see that Worcester and Baltimore City are much higher than the rest of the counties. Worcester has around 6,000 instances of property crime per 100,000 people. Baltimore County has about 4,500 instances. The rest of the counties all hover between 2,000 and 3,000. This puts these two counties significantly higher than the others.

For our second analysis, we are taking a look at cases of larceny. Similar to property crime, Worcester county has a high crime rate. But the rest of the counties are all within the same range. Worcester has around 5,000 instances of larceny per 100,000 people. The next two highest counties are Baltimore City and St. Mary’s county, who both have around 2,500 instances.

When compiling the three different analyses we did on the data, we find that several counties stand out. Worcester and Baltimore Counties are areas of interest that the governor should take into consideration when trying to allocate resources for law enforcement or crime management. In the overall crime, property crime, and larceny analyses, we see that these two counties are commonly higher than the rest. If the governor of Maryland asked the question of what counties need more support for law enforcement, these two counties should be recommended.

The biggest limitation that I had in this study was that some of the data was missing. Howard and Baltimore county had no data in this dataset, so we couldn’t get a complete picture of all of the counties. If we had the data from these counties, the results could have been different. This is also all of the crimes that were reported. There are a lot of crimes that go unreported, which could make a difference in the findings.

Github link: https://github.com/ltwalsh/walshINST414Module1

Link to data: https://opendata.maryland.gov/Public-Safety/Violent-Crime-Property-Crime-by-Municipality-2000-/2p5g-xrcb/about_data

Link to Maryland geojson map: https://github.com/frankrowe/maryland-geojson/blob/master/maryland-counties.geojson

--

--