Exploring Chicago’s Crime Rates
Taking a deep dive into Chicago’s crimes from 2012–2017
Carley Williams
PHASE 1: Exploring the shape & structure of the data
Step 1: Gaining a higher-level understanding of the data
To begin, I looked at the shape of my data. I see there are 1456714 rows and 23 columns. These columns left to right are Unnamed, ID, Case Number, Date, Block, IUCR, Primary Type, Description, Location Description, Arrest, Domestic. Beat, District, Ward, Community Area, FBI Code, X Coordinate, Y Coordinate, Year, Updated On, Latitude, Longitude, and Location.
Step 2: Finding data quality issues
ISSUE ONE: The first column was unnamed and contained random digits for each row. As there was already an index, I deleted this column.
ISSUE TWO: Near the right end of the data frame, there was Latitude, Longitude, and Location. This Location column was just a repeat of the Latitude and Longitude values. I removed this and kept the separate Latitude and Longitude columns to ease processing.
ISSUE THREE: Several column names were composed of two words with spaces in between. To ease processing, I renamed these columns to have an underscore instead of the space.
Checking for Missing Values & Duplicate Rows
- To ensure there were no empty rows, I checked for null values. I found none.
2. Before doing analysis, I wanted to ensure any duplicate rows were removed. I found 2,398 rows of duplicate data and removed them. When removing, I set them to keep the first and drop the second.
Step 3: Looking into data types
Next, I wanted to look into what type of data was in the set to know what I needed to convert.
Next, I wanted to see which of the objects/integers were categories. I suspected perhaps the Primary Type and District were, but I wanted to double-check. To do this, I listed their unique values to see if they were categories instead of unique terms. I also listed the number of unique values each contained.
Finally, I converted the Date column to DateTimeIndex to be able to use panda features later in my analysis.
Columns & their descriptions included in my main data frame:
- Number: Row number used as an index
- ID: Unique numeric ID
- Case Number: Unique numeric/letter based ID
- Date: Datestamp in MM/DD/YYYY, HH:MM:SS day and time format
- Block: Block of crime with street number and street, only showing first 3 digits of street number for privacy. ex) 013XX S Sawyer Ave
- IUCR: 4 digit crime reference number, non-unique
- Primary_Type: Short crime description/categorization
- Description: Longer explanation of crime (1–5 words typically)
- Location_Description: Short location description/categorization
- Arrest: True or False
- Domestic: True or False
- X Coordinate: X coordinate of the location
- Y Coordinate: Y coordinate of the location
- Latitude: Latitude of location
- Longitude: Longitude of Location
- Year: Year crime occurred
PHASE 2: Investigating Initial Questions
As I looked through my data in phase 1, several questions came to mind. The three main questions I was interested in looking into were:
- What are the most prevalent crimes in Chicago?
- How does Chicago’s crime scene change at different times of the day?
- How has crime in Chicago changed from 2012–2017?
Question 1: What are the most prevalent crimes in Chicago?
I wanted to look into what the most common crimes were. First, I looked at an overall picture of how frequent every crime was. To do this, I created a seaborn factor plot showing the primary type against its count. I showed the top 20 to ease the analysis. Beyond this point, the amounts per crime were near zero.
Upon seeing the frequency of theft, I wanted to break things down further and see what type of theft was the most common. To do this, I created a new dataframe with only theft data in it through filtering for the primary type of theft.
Subquestion: What types of theft are the most common?
To see what types of theft were the most common, I plotted theft descriptions within the theft data frame against count.
Now that I knew small theft (less than $500) was the most common type of crime, I wanted to see where this usually happened. To do this, I created a graph showing the location breakdown of small theft crimes.
Subquestion: Where do small theft crimes happen?
To look into this question, I graphed the top 10 locations of small theft crimes with a seaborn factorplot.
After looking at the location breakdown of small theft crimes, I found myself asking another question — where do other crimes in Chicago tend to occur? To begin looking into this question, I first looked at a more general picture of where all crimes in Chicago most often occur.
Subquestion: Where do other crimes besides theft tend to occur?
To look into this question, I plotted a seaborn factor plot with locations of crimes against their count. To start, I looked at a plot with all crimes to get a general idea of where crimes occur.
Upon looking at the more aggregate level, I was curious to see where other crimes besides theft tended to occur. I looked more closely at some of the more popular crimes, and also some more extreme crimes that I was curious to explore. To do this, I created several more dataframes for different types of crime. First, I created dataframes for the top 4 most popular crimes behind theft: battery, criminal damage, narcotics, and assault. Next, I created dataframes for other crimes I was interested to look into. These were sex offense, deception, burglary, arson, child-involved crimes, homicide, and kidnapping. By creating these dataframes, not only could I look at the locations of these crimes, but I could continue to use these dataframes throughout the rest of my analysis.
First, I looked into the four most popular types of crime and their locations behind theft: Battery Crimes, Criminal Damage, Narcotics, and Assault.
This analysis led me to infer you are just as likely to suffer from an assault on the street as you are in your home. This led me to question how different types of assaults would differ between streets and homes.
Subquestion: Where do different types of assaults happen?
To look into the question of where different types of assaults happen, first, I had to look at what the different types of assaults were. To do this, I used the same method as I analyzed the different types of theft: creating a seaborn countplot by description.
Subquestion: What are the different types of assaults?
This showed me a limitation in the database — I can infer because there is a separate primary crime type category of “Sex Offense”, these simple assaults are physical and not sexual. However, I wonder if some crimes are falsely described as an assault that could fall into another category, like sex offense.
After seeing the dominance of “simple” assaults in the category, I wanted to see if different types of assaults had different typical locations. To look at this, I first created three new dataframes for the top three types of assault: simple, aggravated: handgun, and aggravated: knife/cutting. This revealed a data inconsistency issue: between the knife assault descriptions and the gun assault descriptions, there is a small inconsistency issue with the semicolon. With the knife assaults, there is no space between aggravated and the description, but with gun assault, there is.
Summarizing my findings looking at the different types of assaults and their locations, I found that knife/cutting assaults were far more likely to happen on the streets, where gun assaults and simple assaults were more likely to happen at the home. This showed me that my initial assumption of being just as likely to face an assault at home or on the street in Chicago was more complicated. At home, you are more likely to face a gun or simple assault, where on the street, you are significantly more likely to face a knife assault.
Subquestion: Where do other types of crimes happen?
Besides just looking at assaults and the most popular types of crime, I found several interesting findings looking at different types of crimes and their most common locations. Most notably were sex offense crimes, child related crimes, homicides, and deception crimes.
Sex Offense Locations
Child Related Crime Locations
Homicide Crime Locations
Deception Crime Locations
Question 2: How does Chicago’s crime scene change at different times of the day?
For my second question, I wanted to see what types of crimes were occurring throughout the day. To do this, I broke the day into six equal four hour blocks.
Once I had these blocks, I created dataframes for each of them. I utilized Panda’s between_time feature to do this. As this is inclusive, I set each time to start at the top of the hour and go until the 59th minute of the fourth hour in that block.
Next, I wanted to see what the most common crimes were. First, I did this through grouping to see the numbers for each group.
After seeing the numbers, I wanted to create visualizations for each block of time so that I could compare crime throughout the day. However, first, I realized I must create a visualization for the overall most popular crimes to give myself a baseline to compare each time block to. I used the same visualization I created before to analyze my first question.
This analysis showed me another weakness in this data — the “other offense” category. As the sixth most popular type of crime, it’s clear that crimes are being put into the database with an “other offense” label. This makes the analysis more difficult because this type of crime cannot be inferred, and perhaps crimes with this label could have been more appropriate to be categorized as something else.
Morning Crimes (6–10AM)
Midday Crimes (10–2PM)
Afternoon Crimes (2–6PM)
Evening Crimes (6–10PM)
Night Crimes (10PM–2AM)
Late Night Crimes (2–6AM)
Summary:
Summarizing my findings looking at different time periods of the day, I found that as the night progresses, battery becomes the most popular type of crime, replacing theft. Additionally, narcotics seems to be the most common in the evening, from 6–10PM, perhaps when the daylight tends to drop as these crimes tend to occur on the street. Sexual assault is the most common in the very late night (2–6AM). Finally, burglary tends to be the most common in the morning and in the late night, perhaps when people are assumed to be sleeping.
Question 3: How has crime in Chicago changed from 2012–2017?
My final question was to look at how Chicago’s crime scene has changed as the years progressed. Did crimes increase, decrease, stay the same? To look at this I first created a visualization of how total crimes have changed from 2012–2017.
It’s notable that in 2017, upon investigation, the data is far less because the data has not been fully inputted in at the time of this database’s inception. Thus, I will only consider 2012–16 for true analysis.
After looking at this aggregate analysis of crime, I wanted to see how different types of crime changed over the five years. I decided to look at the five most popular types of crime and see how they changed from 2012–2017. These are theft, battery, criminal damage, narcotics, and assault.
Subquestion: How did the most common types of crime change in Chicago from 2012–2017?
After looking at the most popular types of crime, I wanted to finally see how other types of crime rates changed across the years. I looked at many other dataframes to analyze this, and the most notable I found were deceptive crimes, arson, and homicide.
Subquestion: How did other types of crime rates change in Chicago from 2012–2017?