Detroit Crimes Data Visualization

5 min readMay 14, 2023

According to the FBI crime statistics report released in 2021, Detroit remains one of the most violent cities in the country. In this article, I’ll present analysis of the reported crimes data for the city from 2017 to 2022. The data is obtained from https://data.detroitmi.gov/datasets/rms-crime-incidents/

# import required libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

font = {'family' : 'Dejavu Sans',
        'weight' : 'normal',
        'size'   : 18}
plt.rc('font', **font)

Reading The Data From File

detroit_df = pd.read_csv('rms_crime_incidents.csv')
detroit_df.columns

Index(['crime_id', 'report_number', 'address', 'offense_description',
       'offense_category', 'state_offense_code', 'arrest_charge',
       'charge_description', 'incident_timestamp', 'incident_time',
       'day_of_week', 'hour_of_day', 'year', 'scout_car_area', 'precinct',
       'block_id', 'neighborhood', 'council_district', 'zip_code', 'longitude',
       'latitude', 'geom', 'ibr_date', 'oid'],
      dtype='object')

We are interested only in offense_category, day_of_week, hour_of_day, year, neighborhood, and the zip_code columns and only in reports from 2017 to 2022.

relevant_columns = ['offense_category', 'day_of_week', 'hour_of_day', 'year', 'neighborhood', 'zip_code']
detroit_df = detroit_df.loc[:,relevant_columns]
detroit_df = detroit_df[(detroit_df['year'] >= 2017) & (detroit_df['year'] < 2023)]

Let’s check the data types and shape of the dataframe.

detroit_df.dtypes, detroit_df.shape

(offense_category     object
 day_of_week         float64
 hour_of_day         float64
 year                float64
 neighborhood         object
 zip_code              int64
 dtype: object,
 (486195, 6))

We need to change the data type for the year column to integer and check if there are any rows that contain null values.

detroit_df['year'] = detroit_df['year'].astype(int)
detroit_df['offense_category'] = detroit_df['offense_category'].astype(pd.StringDtype())
detroit_df.isna().sum(), detroit_df.dtypes

(offense_category    0
 day_of_week         0
 hour_of_day         0
 year                0
 neighborhood        0
 zip_code            0
 dtype: int64,
 offense_category     string
 day_of_week         float64
 hour_of_day         float64
 year                  int64
 neighborhood         object
 zip_code              int64
 dtype: object)

Analysis And Visualization

Now that our data is clean and ready, let’s try to get some useful insights about crime statistics in Detroit over the past six years. At first let’s look at the total number of incidents reported for each year.

sns.catplot(data=detroit_df, x = 'year', kind = 'count', height=7, aspect = 1.8)

We can see that the total reported crime incidents were consistent over the six years. Now let’s look at the total incident over six years by zip code.

sns.catplot(data=detroit_df, y = 'zip_code', kind = 'count', height=15, aspect = 1.2)

The total criminal incident reported is higher in the 48228 zip code, which is a residential Western part of Detroit. The least reported incidents are in the 48243 zip code which is a Downtown business area. Now let’s look at the statistics based on neighborhood.

sns.catplot(data=detroit_df, y = 'neighborhood', kind = 'count', height=40, aspect = 0.6)

The Warrendale neighborhood, which is located in the 48228 zipcode, reported the highest crime incidents while the Douglass and Belle Isle, both located in midtown, reported the least incidents. Let’s look at what the offenses are.

pd.unique(detroit_df['offense_category'])

<StringArray>
[               'ROBBERY',                  'ARSON',     'DAMAGE TO PROPERTY',
         'FAMILY OFFENSE',                'LARCENY',                  'OTHER',
                  'FRAUD',               'BURGLARY',                'RUNAWAY',
                   'OUIL',         'STOLEN VEHICLE',        'STOLEN PROPERTY',
                'ASSAULT',              'EXTORTION',          'MISCELLANEOUS',
     'AGGRAVATED ASSAULT',             'KIDNAPPING',       'WEAPONS OFFENSES',
               'HOMICIDE',                'FORGERY',        'DANGEROUS DRUGS',
         'SEXUAL ASSAULT',           'SEX OFFENSES',                 'LIQUOR',
     'DISORDERLY CONDUCT',           'SOLICITATION',  'OBSTRUCTING JUDICIARY',
 'OBSTRUCTING THE POLICE',   'JUSTIFIABLE HOMICIDE',               'GAMBLING']
Length: 30, dtype: string

Let’s select very serious offenses such as Robbery, Arson, aggravated assault, kidnapping, weapons offenses, Homicide, sex offenses, and Justifiable Homicide.

serious_offenses = ['ROBBERY','ARSON', 'AGGRAVATED ASSAULT', 'WEAPONS OFFENSES', 'HOMICIDE', 'SEX OFFENSES', 'JUSTIFIABLE HOMICIDE']
serious_offenses_df = detroit_df.loc[detroit_df['offense_category'].isin(serious_offenses)]

Now let’s examine what the total reported incident looks like for each year for each offense category.

Aggravated assault is the most reported category of offenses for all the six years and it appears to be fairly consistent. Justifiable homicide and homicide are the least reported offenses for all six years. The second most reported offense is Robbery for the first three years and weapons offenses for the remaining three years. Now let’s visualize the statistics based on zipcode and neighborhood

sns.catplot(data=serious_offenses_df, y = 'zip_code', kind = 'count', height=35, aspect = 0.5, hue='offense_category')

sns.catplot(data=serious_offenses_df, y = 'neighborhood', kind = 'count', height=40, aspect = 0.6)

Aggravated assault is the most reported category of offense in all zip codes and robbery and weapons offenses are the second most reported offenses. 48228 and 48243 are where the most and least offenses were reported respectively. Now let’s look at the three most reported incidents and examine how it varies with day of week and time of day.

three_most_reported = ['ROBBERY', 'AGGRAVATED ASSAULT', 'WEAPONS OFFENSES']
three_most_reported_df = serious_offenses_df.loc[serious_offenses_df['offense_category'].isin(three_most_reported)]

sns.catplot(data=three_most_reported_df, x = 'hour_of_day', kind = 'count', height=10, aspect = 2, hue='offense_category')

All crimes tend to increase during the afternoon, evening and late nights. There is a small difference in the frequency of crime in the days of the week. There is a slight increase on weekends. Finally, we will look at the two homicide related offenses, homicide and justifiable homicide.

homicide_offenses = ['HOMICIDE', 'JUSTIFIABLE HOMICIDE']
homicide_offenses_df = serious_offenses_df.loc[serious_offenses_df['offense_category'].isin(homicide_offenses)]

sns.catplot(data=homicide_offenses_df, x = 'year', kind = 'count', height=7, aspect = 3, hue='offense_category')

sns.catplot(data=homicide_offenses_df, y = 'zip_code', kind = 'count', height=35, aspect = 0.5, hue='offense_category')

Zip codes 48205 and 48227 are the two areas where most of the homicide offense reported from for the past five years.

Detroit Crimes Data Visualization

Reading The Data From File

Analysis And Visualization

Written by Bekele Erko