This the first of many blog posts for ChiPy’s Spring Mentorship Program. I feel extremely lucky that I was selected to be a part of this wonderful cohort, and I’m excited to see what I can learn and create in the next few months.
Where I’m coming from
I’m a senior at UChicago in my last quarter of classes. I’m also a history major who studies American labor history, so how the heck did I find myself in a Python mentorship program?
I had an unpaid summer internship a few years ago where I found myself running out of things to do, so naturally, I enrolled in an online course to learn data analysis with R to learn how to make some of the Excel tasks I had more efficient. One thing led to another, and I found myself enrolling in class called Computer Science with Applications. The class was taught in Python, and I got to implement some cool things — probabilistic record linkage, analyzing election tweets, etc. — but it was all done in a very controlled environment. I was given a dataset, wrote programs on a Linux virtual machine, ran the tests they gave me, and submitted it.
But what if I wanted to create my own web scraper, or run my own analyses? Honestly, I was at a loss for where to even start: how do I use GitHub, how do I even run Python on my mac? These were all questions that my mentor, Jeremy, patiently went over with me at our first meeting. With ChiPy, I’m hoping to really get my hands dirty working with Python and also learning Web Dev IRL (in real life). I’d like to reach a level of competency where someone can say, “Hey Amy, can you make a web interface that recommends news articles from the opposing end of the political spectrum for me to read?” and I would be able to easily figure it out.
The big project, or at least the idea
I write for a local newspaper and have been interested in interactive data journalism for a while, enjoying it from a distance but never feeling like I had the skills to actually create a data feature myself. Being a history major, I’m always interested in change over time. The question I’d like to answer is: How has gentrification/the geographic shift of wealth and community investment affected transit inequality in Chicago? My project will look at the relationship between CTA train ridership and how that relates to changes in property values in various Chicago neighborhoods from 2001 to 2018. I’d also like to incorporate Census data, property development data, and really any other relevant datasets I can get my hands on along the way. The basic idea is to generate a heatmap of Chicago for each year from 2001 that shows gentrification patterns and how CTA routes have changed. It would be displayed on an interactive interface that would allow a user to easily compare each year and add layers to a map (built with D3js and Django, maybe?). The final product would look similar to DataMade’s Million Dollar Blocks project, with a little reporting on the side, or the Urban Institute’s Interactive Feature on the Housing Boom and Bust.
As you can see, I still have a lot of details to hammer out. I’m still very much in the exploratory phases of my project, unclear exactly what statistical clustering methods I might use, etc. The plan for now is to get my data loaded, cleaned, and put into a database that matches lat/lon. If anyone has any pointers on the rest, please send them along!