Mason Hass
4 min readFeb 8, 2022

After reading Peng and Matsui’s Chapter 3, about asking a good question, the exploratory question I’ve decided to ask is as follows. Does the amount of cannabis sales correlate with the amount of crime each year? Before I discuss how I will attempt to find the pattern I’m looking for, I will discuss why I choose this topic and how this study will be beneficial to society. As we all know, in Colorado in 2012, cannabis was legalized medically and recreationally statewide, with many states over the last few years following the same trend. However, with the legalization, I always have wondered if this has created an increase in crime rates in the states and counties where it has been legalized. For example, since most of us live in or around Boulder, where we all attend the University of Colorado, it would be beneficial to know this information as we try to stay safe in our day-to-day lives.

To find the relationship between these two variables, cannabis sales, and crime rates, I will be using the Colorado cannabis sales and Colorado crime rates data frames from class. The data in the Colorado cannabis sales files include the year, month, county, type, and sales. In the Colorado crime data file, the information present consists of the county, year, and types of crimes. The individual variables that can be seen in both are the year, and county, which helps me understand how I can approach this question. Also, note that I will be using data from 2016 to the present because this is after marijuana was legalized recreationally in Colorado, and is when the crime data begins. The first step in determining these relationships is finding the total amount of sales, by year, for all counties in Colorado, and which years had the largest amount of cannabis sales. Vice-versa, I then can find the total crimes in Colorado, by year, for all counties, and which year had the highest total amount of crimes. The next step in my analysis is to join the data frames together. Joining or merging the data then allows me to visually graph these findings against one another, and determine if there is a positive, negative, or neutral relationship. Lastly, once I have joined my two data frames together, I can add a new column called “Sales to Crimes Ratio”, where I will divide the total cannabis sales by total crimes for that year, then graph and examine this relationship. For examining relationships, scatter plots are best because they allow you to visually see the trend of the data, by using a trend line. Creating a sales to crimes ratio is important because this allows me to see how the total cannabis sales compare to the total crimes for that year. I believe by doing this analysis I will be able to answer my question and present these findings to class and society.

After conducting my analysis, I was first able to create a table with my findings. In my table are the years 2016 to 2021, the total cannabis sales within that year for all counties, the total crimes within that year for all counties, and the sales to crimes ratio.

From looking at the table, one can see that the total cannabis sales increases every year from 2016 to 2021. It can also be observed when looking at the total crimes column, that the total crimes increase with each passing year. The biggest observation that can be made from this table is when looking at the sales to crimes ratio. Overall, the sales to crimes ratio increases almost every year, except from 2020 to 2021. Since all of the sales to crimes ratios are positive, this shows a positive correlation between total cannabis sales and crimes from the years 2016 to 2021. The relationship that can be understood from this table then is that as cannabis sales increase, crime also increases. Although this finding shows a positive correlation, this does not show a full causation relationship. A causation relationship, for example, would mean that because total cannabis sales increase, this causes total crimes to increase, but this may not be the case. There could be plenty of other reasons why crime could increase, but total cannabis sales could be related to this increase. To visually see this relationship, below is a scatterplot displaying the relationship between total cannabis sales and total crimes by year, using the sales to crimes ratio.

From examining the graph of Sales to Crimes Ratio by Year, one can see that years have been displayed on the x-axis, while the sales to crimes ratios are displayed on the y-axis. By identifying the blue trend line, one can see a strong positive relationship between total cannabis sales and total crimes over time. The slope of this trend line from above is at 379.83, meaning that each year the sales to crimes ratio goes up by around 400.

In conclusion, to answer my original question, does the amount of cannabis sales correlate with the amount of crime each year? Using data analysis, I have identified that there is a correlation between the number of cannabis sales by year and total crimes by year. This relationship can be proven by looking at the table, and the slope of the trend line on the graph. Hopefully, this information can be of help to my classmates, and society!