Mapping Atlanta Crime Data — Python Data Visualization Tutorial

Nicholas Oxford
5 min readAug 30, 2021

--

I added a video walkthrough: https://www.youtube.com/watch?v=tTpkM8tAgjI

Listen, I am not some coding tutorial genius. I’m sure you can go to Udemy.com and get a better overview of Python. But, I want my friends to succeed. I feel like, spending a little time learning to code can help unleash some hidden creativity.

When I started to learn how to code, I wanted to know how things *worked*. How did this function read_csv() understand anything? Guess what — it kinda doesn’t matter. While curiosity is essential to programming, don’t get lost in the sauce.

Running code on your computer is hard. Especially when you try writing more than one program. I say screw it, use some online tool. We’re going to use Google Colab.

Google Colab Is Where We Thrive

Google Colab allows you to write, visualize, and share your code. If you have never coded before, it can be a bit foreign, but you’ll see this notebook style everywhere.

Go ahead and create a Google Colab Notebook. In the first block add this code.

!pip install geopandas!pip install shapelyimport pandas as pdimport numpy as npimport geopandas as gpdfrom shapely.geometry import Point, Polygonimport matplotlib.pyplot as plt

Click play and you should get an output that looks like the image below. What we are doing here is both installing dependencies for our environment and importing them. Notice how we don’t have to install numpy or pandas, that’s because Colab comes with them. You will see these two packages everywhere.

In the past year, certain crimes in Atlanta have risen. I wanted to use python to explore if I could see this uptick visually. I went to the Atlanta Police Department’s website and grabbed the raw data for 2020 and 2021. Let’s go ahead and import that data into our program. In either a new cell (Insert → New Cell) or under your import statement. I added the csv to my github for easy access.

#import csv: https://www.atlantapd.org/i-want-to/crime-data-downloadslink2021 = 'https://github.com/nicholasoxford/pythonViz/blob/main/COBRA-2021%203.csv?raw=true'df2021 = pd.read_csv(link2021)

pd.read_csv() is using pandas built ability to parse a csv. If you type in pd.read_ you will see:

How do you parse an excel doc using python code? I don’t know, and I don’t need to with these functions. When we are saying df2021= pd.read_csv(link2021) we are assigning what read_csv() returns to df2021. This function returns a dataframe, which is equivalent to a table. In Colab if you simply type df on the next line and click play it will print its contents.

Dataframe content

Furthermore, our current DataFrame doesn’t have enough info to easily plot its data on a map. Adding a column that combines our long and lat values will enable us to create a GeoDataFrame. You can learn about converting a DataFrame to a GeoDataFrame below (remember, this is not necessary!)

Next we need to represent Atlanta in some way. There is where the package geopandas (gpd) comes into play. I found this shapefile of Atlanta roads and learned about something called Coordinate Reference System (crs). If you think about a 2d map, one map might use longitude and latitude to define x and y, while another might use a different system. We can project these coordinates onto a different crs to get them to align up.

For this, we are going to quickly connect our Collab Notebook to Google Drive. Going forward, it’s easiest to upload a file you will use into google drive and import it from there. Follow this link (shapefile of Atlanta roads) and click download. Grab the entire folder and upload it to your Google Drive. You could also directly add this to your notebook, but I like saving things in drive.

Follow this tutorial to connect your notebook to your drive.

Add the following code to your notebook.

#grab data for a streetmapstreet_map = gpd.read_file('/content/drive/MyDrive/GC_Roads/GC_RD_GA/Roads_Atlanta_GA.shp')street_map.to_crs(4326)geometry21 = [Point(xy) for xy in zip(df2021['long'], df2021['lat'])]geo_df21 = gpd.GeoDataFrame(df2021, crs="EPSG:4326", geometry=geometry21)geo_df21

I recommend browsing the article below to learn more about CRS.

Next we need to layer both the street_map data and crime data. We do this using subplots. plt.subplots() returns a figure and axis. We can go ahead and ignore the figure (fig) and focus on the axis. When we call .plot() we will specify that we have a matlib axes instance by saying ax=ax. Obviously if we said defined ax as something like axInstance we would say ax=axInstance.

Finally we define both a longitude and latitude limit to match our shapefile. If we don’t define this we will see that the crime data covers a much larger area than our shapefile.

#plot both streetmap and and long/lat pointsfig, ax = plt.subplots(figsize=(15,15))geo_df21.plot(column="UC2_Literal", ax=ax,alpha=0.5, legend=True,markersize=30)street_map.to_crs(4326).plot(ax=ax, alpha=0.4,color='grey')# add title to graphplt.title('Crime in ATL', fontsize=15,fontweight='bold')# set latitiude and longitude boundaries for map displayplt.xlim(-84.425,-84.35)plt.ylim( 33.745, 33.813)plt.show()

And our final result?

Some interesting things to note. The keen-eyed will notice a desert of data around Georgia Tech. This makes sense as Georgia Tech probably reports their own data.

If there is any interest, I will add a part 2 where we make the data interactive or compare 2020 crime data. Let me know which you would rather see! Link to my notebook below

--

--