Exploring Chicago’s Crime Rates

Carley Williams
Analytics Vidhya
Published in
11 min readMar 7, 2021

Taking a deep dive into Chicago’s crimes from 2012–2017

Carley Williams

PHASE 1: Exploring the shape & structure of the data

Step 1: Gaining a higher-level understanding of the data

To begin, I looked at the shape of my data. I see there are 1456714 rows and 23 columns. These columns left to right are Unnamed, ID, Case Number, Date, Block, IUCR, Primary Type, Description, Location Description, Arrest, Domestic. Beat, District, Ward, Community Area, FBI Code, X Coordinate, Y Coordinate, Year, Updated On, Latitude, Longitude, and Location.

Step 2: Finding data quality issues

ISSUE ONE: The first column was unnamed and contained random digits for each row. As there was already an index, I deleted this column.

ISSUE TWO: Near the right end of the data frame, there was Latitude, Longitude, and Location. This Location column was just a repeat of the Latitude and Longitude values. I removed this and kept the separate Latitude and Longitude columns to ease processing.

ISSUE THREE: Several column names were composed of two words with spaces in between. To ease processing, I renamed these columns to have an underscore instead of the space.

Checking for Missing Values & Duplicate Rows

  1. To ensure there were no empty rows, I checked for null values. I found none.
This shows the check for nulls which returned all false values.

2. Before doing analysis, I wanted to ensure any duplicate rows were removed. I found 2,398 rows of duplicate data and removed them. When removing, I set them to keep the first and drop the second.

Step 3: Looking into data types

Next, I wanted to look into what type of data was in the set to know what I needed to convert.

This shows the 17 columns I am using in my data frame and the different object types they are.

Next, I wanted to see which of the objects/integers were categories. I suspected perhaps the Primary Type and District were, but I wanted to double-check. To do this, I listed their unique values to see if they were categories instead of unique terms. I also listed the number of unique values each contained.

This shows the number of each unique output each column contained. This supported my findings below of Primary Type, Ward, Community Area, IUCR, and FBI code being categorical with such low amounts.
This output supports my findings above of which columns had categorical values and which were not. I knew I needed to change these integers and objects into categories. I then converted IUCR, Ward, Community_Area, FBI Code, and Primary_Type into categories.

Finally, I converted the Date column to DateTimeIndex to be able to use panda features later in my analysis.

Columns & their descriptions included in my main data frame:

  • Number: Row number used as an index
  • ID: Unique numeric ID
  • Case Number: Unique numeric/letter based ID
  • Date: Datestamp in MM/DD/YYYY, HH:MM:SS day and time format
  • Block: Block of crime with street number and street, only showing first 3 digits of street number for privacy. ex) 013XX S Sawyer Ave
  • IUCR: 4 digit crime reference number, non-unique
  • Primary_Type: Short crime description/categorization
  • Description: Longer explanation of crime (1–5 words typically)
  • Location_Description: Short location description/categorization
  • Arrest: True or False
  • Domestic: True or False
  • X Coordinate: X coordinate of the location
  • Y Coordinate: Y coordinate of the location
  • Latitude: Latitude of location
  • Longitude: Longitude of Location
  • Year: Year crime occurred

PHASE 2: Investigating Initial Questions

As I looked through my data in phase 1, several questions came to mind. The three main questions I was interested in looking into were:

  1. What are the most prevalent crimes in Chicago?
  2. How does Chicago’s crime scene change at different times of the day?
  3. How has crime in Chicago changed from 2012–2017?

Question 1: What are the most prevalent crimes in Chicago?

I wanted to look into what the most common crimes were. First, I looked at an overall picture of how frequent every crime was. To do this, I created a seaborn factor plot showing the primary type against its count. I showed the top 20 to ease the analysis. Beyond this point, the amounts per crime were near zero.

This graph shows the type of crime plotted against its count. You can see the most common, by far, is theft, followed by battery, criminal damage, narcotics, assault, and beyond. It was interesting to see the magnitude to which theft and battery were by far the most common crimes.

Upon seeing the frequency of theft, I wanted to break things down further and see what type of theft was the most common. To do this, I created a new dataframe with only theft data in it through filtering for the primary type of theft.

Subquestion: What types of theft are the most common?

To see what types of theft were the most common, I plotted theft descriptions within the theft data frame against count.

This plot shows the dominance of $500 and under crimes within the theft category. This led me to see that “petty theft”, or small-scale theft, is the most common crime in Chicago from 2012–2017.

Now that I knew small theft (less than $500) was the most common type of crime, I wanted to see where this usually happened. To do this, I created a graph showing the location breakdown of small theft crimes.

Subquestion: Where do small theft crimes happen?

To look into this question, I graphed the top 10 locations of small theft crimes with a seaborn factorplot.

This graph shows that the most common place, by far, for small theft is the street. This leads me to infer theft is likely from stores and retail areas on the street, or potentially from individuals and persons on the street.

After looking at the location breakdown of small theft crimes, I found myself asking another question — where do other crimes in Chicago tend to occur? To begin looking into this question, I first looked at a more general picture of where all crimes in Chicago most often occur.

Subquestion: Where do other crimes besides theft tend to occur?

To look into this question, I plotted a seaborn factor plot with locations of crimes against their count. To start, I looked at a plot with all crimes to get a general idea of where crimes occur.

Upon doing this analysis, I saw that most crimes occur on the streets. I infer that this is due to mostly theft, as it's the most common type of crime. I found it was interesting how highly ranked residence, apartments, and sidewalks were, especially since they were so low in theft.

Upon looking at the more aggregate level, I was curious to see where other crimes besides theft tended to occur. I looked more closely at some of the more popular crimes, and also some more extreme crimes that I was curious to explore. To do this, I created several more dataframes for different types of crime. First, I created dataframes for the top 4 most popular crimes behind theft: battery, criminal damage, narcotics, and assault. Next, I created dataframes for other crimes I was interested to look into. These were sex offense, deception, burglary, arson, child-involved crimes, homicide, and kidnapping. By creating these dataframes, not only could I look at the locations of these crimes, but I could continue to use these dataframes throughout the rest of my analysis.

First, I looked into the four most popular types of crime and their locations behind theft: Battery Crimes, Criminal Damage, Narcotics, and Assault.

This graph shows the location of battery crimes. The most common location was an apartment, followed by residence, and then sidewalk and street. This led me to infer most battery crimes were at the home, and then the streets. Other locations were insignificant.
This graph shows the location of criminal damage crimes. The vast majority of these were on the street, followed by residence and apartment. This also supports the findings that the street was the most common place for crimes. This led me to assume much of Chicago’s criminal damage was done outdoors and in public.
I found this graph of the most popular place for narcotics to be interesting. I infer that many of these narcotics charges are either dealing or using drugs, so it makes logical sense this would happen on the streets or sidewalks.
This graph shows the most common location of assault crimes in Chicago. I thought it was interesting to see that the street areas (street & sidewalk) were nearly even with home areas (residence & apartment).

This analysis led me to infer you are just as likely to suffer from an assault on the street as you are in your home. This led me to question how different types of assaults would differ between streets and homes.

Subquestion: Where do different types of assaults happen?

To look into the question of where different types of assaults happen, first, I had to look at what the different types of assaults were. To do this, I used the same method as I analyzed the different types of theft: creating a seaborn countplot by description.

Subquestion: What are the different types of assaults?

This showed me the vast majority of assaults were “simple”, followed by handgun and then knife/cutting.

This showed me a limitation in the database — I can infer because there is a separate primary crime type category of “Sex Offense”, these simple assaults are physical and not sexual. However, I wonder if some crimes are falsely described as an assault that could fall into another category, like sex offense.

After seeing the dominance of “simple” assaults in the category, I wanted to see if different types of assaults had different typical locations. To look at this, I first created three new dataframes for the top three types of assault: simple, aggravated: handgun, and aggravated: knife/cutting. This revealed a data inconsistency issue: between the knife assault descriptions and the gun assault descriptions, there is a small inconsistency issue with the semicolon. With the knife assaults, there is no space between aggravated and the description, but with gun assault, there is.

This shows the data inconsistency issue with the semicolons in the AGGRAVATED: descriptions.
This graph shows the most common places for simple assaults to happen. This shows the most common place for these is homes (residences and apartments).
This graph shows the most common place for handgun assaults. This shows that the most common location for these is at the home, especially apartments. When compared to simple assault, you can see that handgun assaults are relatively more common at apartments and residences compared to streets.
This shows the most common places for knife/cutting assaults to happen. By far, the most common place for this to occur is on the street. This is very different from both simple and gun assaults, which were far more common in the home (apartments/residences) than the street.

Summarizing my findings looking at the different types of assaults and their locations, I found that knife/cutting assaults were far more likely to happen on the streets, where gun assaults and simple assaults were more likely to happen at the home. This showed me that my initial assumption of being just as likely to face an assault at home or on the street in Chicago was more complicated. At home, you are more likely to face a gun or simple assault, where on the street, you are significantly more likely to face a knife assault.

Subquestion: Where do other types of crimes happen?

Besides just looking at assaults and the most popular types of crime, I found several interesting findings looking at different types of crimes and their most common locations. Most notably were sex offense crimes, child related crimes, homicides, and deception crimes.

Sex Offense Locations

This graph shows the most common places for a sex offense crime to happen. By far, the most common place was at a home (residence or apartment). Followed by this were the sidewalk and street and then a school or alley. Other locations were fairly insignificant.

Child Related Crime Locations

This graph shows the location of child-related crimes. By far, the most common place for these was a residence or apartment.

Homicide Crime Locations

This graph shows the locations of homicides in Chicago. By an overwhelming majority, the most common place for this type of crime was on a street. Followed by this was apartment and alley, and all other locations were insignificant.

Deception Crime Locations

This graph shows the most common location for deception crimes. The majority of these, by a significant amount, was at a residence, followed by apartment and “other”. This was a larger amount of “other” than most other analyses, showing a weakness of this data and potential variance in these types of deception crimes.

Question 2: How does Chicago’s crime scene change at different times of the day?

For my second question, I wanted to see what types of crimes were occurring throughout the day. To do this, I broke the day into six equal four hour blocks.

This shows the six equal blocks I split the day into for my analysis: Morning, Midday, Afternoon, Evening, Night, and Late Night

Once I had these blocks, I created dataframes for each of them. I utilized Panda’s between_time feature to do this. As this is inclusive, I set each time to start at the top of the hour and go until the 59th minute of the fourth hour in that block.

Next, I wanted to see what the most common crimes were. First, I did this through grouping to see the numbers for each group.

This shows my grouping code and output for the morning. By grouping the data by primary type, you can see the count for each type of crime in the morning. I repeated this for each block.

After seeing the numbers, I wanted to create visualizations for each block of time so that I could compare crime throughout the day. However, first, I realized I must create a visualization for the overall most popular crimes to give myself a baseline to compare each time block to. I used the same visualization I created before to analyze my first question.

This graph shows that the five most popular crimes were: Theft, Battery, Criminal Damage, Narcotics, and Assault. With this baseline, I can continue with my analysis to compare each time block to this.

This analysis showed me another weakness in this data — the “other offense” category. As the sixth most popular type of crime, it’s clear that crimes are being put into the database with an “other offense” label. This makes the analysis more difficult because this type of crime cannot be inferred, and perhaps crimes with this label could have been more appropriate to be categorized as something else.

Morning Crimes (6–10AM)

This graph shows the distribution of crimes in the morning. When compared to the baseline, burglary is more common in the morning as is offenses involving children, criminal trespassing, robbery, and deceptive practices. Narcotics is far less common in the morning.

Midday Crimes (10–2PM)

This graph shows the crimes that happen in the midday period between 10AM-2PM. Assault is less common than average, where weapons violations is slightly more common as is robbery. Compared to the morning, there is more narcotics.

Afternoon Crimes (2–6PM)

This graph shows the crime distribution for crimes happening in the afternoon ( from 2–6PM). Compared to average, battery is slightly less common, but everything else is fairly average. Compared to the morning, there is slightly less burglary.

Evening Crimes (6–10PM)

This graph shows evening crimes from 6–10PM. The distribution of crime is fairly average, except that motor vehicle theft and narcotics are slightly more common. Further, as the day has gone on, battery has become increasingly common and is nearing closer to the amount of theft.

Night Crimes (10PM–2AM)

This graph shows crimes that happen at night, between the hours of 10PM-2AM. Compared to the average and earlier hours of the day, battery has now become more common than theft for the first time, and burglary and robbery continue to become less common. Narcotics is slightly less common than compared to evening crimes.

Late Night Crimes (2–6AM)

This graph shows late-night crimes from 2–6AM. This time period has the most battery of all time periods, and far fewer amounts of other crimes. Narcotics becomes far less common than it was in the night or evening. Robbery also becomes slightly more common and sexual assault becomes more common than any other time period.

Summary:

Summarizing my findings looking at different time periods of the day, I found that as the night progresses, battery becomes the most popular type of crime, replacing theft. Additionally, narcotics seems to be the most common in the evening, from 6–10PM, perhaps when the daylight tends to drop as these crimes tend to occur on the street. Sexual assault is the most common in the very late night (2–6AM). Finally, burglary tends to be the most common in the morning and in the late night, perhaps when people are assumed to be sleeping.

Question 3: How has crime in Chicago changed from 2012–2017?

My final question was to look at how Chicago’s crime scene has changed as the years progressed. Did crimes increase, decrease, stay the same? To look at this I first created a visualization of how total crimes have changed from 2012–2017.

This graph shows the amount of crimes from 2012–2017. The general trend is a decline until 2015, where there is then a slight increase to 2016.

It’s notable that in 2017, upon investigation, the data is far less because the data has not been fully inputted in at the time of this database’s inception. Thus, I will only consider 2012–16 for true analysis.

After looking at this aggregate analysis of crime, I wanted to see how different types of crime changed over the five years. I decided to look at the five most popular types of crime and see how they changed from 2012–2017. These are theft, battery, criminal damage, narcotics, and assault.

Subquestion: How did the most common types of crime change in Chicago from 2012–2017?

This shows theft copied the main trend of crimes — decreasing from 2012–2015, but had a more significant increase in 2016 and a slightly more significant decrease from 2014–2015.
This shows battery across the years also followed a similar trend but had less of a decline from 2014–2015 and a slightly more significant increase in 2016.
This graph shows the trend for criminal damage in Chicago. This shows an increase, instead of a decrease, from 2014 to 2015, and an even more significant increase from 2015–2016.
This shows the graph of narcotics crime from 2012–2017, showing a very different trend than others. Every year, there is a decline, departing from the standard increase from 2015–2016. This decrease instead is far more significant, showing an overall very different trend with narcotics.
Assault showed a similar trend to the general Chicago crime trend, aside from having an increase from 2014–2015 instead of a decrease. Additionally, a significantly larger increase than standard was seen from 2015–2016.

After looking at the most popular types of crime, I wanted to finally see how other types of crime rates changed across the years. I looked at many other dataframes to analyze this, and the most notable I found were deceptive crimes, arson, and homicide.

Subquestion: How did other types of crime rates change in Chicago from 2012–2017?

This shows the trend of deceptive crimes in Chicago from 2012–2017. Compared to the more average crime trend, deceptive crimes had an increase across all years. This contrasted with the standard trend of crime decreasing from 2012–2015 and increasing only in 2016.
This graph shows the stark decrease and then steady increase of arson from 2012–2017. This contrasted the standard trend of decreasing and then increasing from 2015–2016. Instead, arson consistently increased from 2013 on.
This graph shows the homicide crime rate from 2012–2017. This graph shows a much more significant increase from 2015 to 2016 than typical. Additionally, the level of homicide in 2016 is nearly double all other years, which is a much more significant difference in crime than many other types. This shows the massive increase in homicides in 2016.

--

--