Lessons from a Python Hatchling

From Academia to Data Science

At my core, I am a researcher. I love to learn. Like, I love it. When I was a child, I was extremely excited by trips to book stores and time to read in silence. I have always, and I mean ALWAYS, been this way. Some of my fantasies include winning the lottery, so that I can buy a home that’s large enough to have an obscene private library.

An obscenely beautiful library with etched glass partitions, glass floor and well-lit books. Akin to something from Hogwarts
Jay Walker’s private library, The History of Human Imagination

Therefore, it’s not that surprising that I have a Phd…being paid to learn and research was my idea of living the dream (minus Jay Walker’s private library). In my former careers I was a researcher, professor, administrator, and advocate for children with disabilities. My interest in advocacy led me to co-found an organization to empower teenagers and young adults with severe disabilities to lead more independent lives. I decided to create the website and maintain its data myself, to save on cost, and I began to enjoy this aspect of the job more than my original responsibilities. I also began to help other small business owners with their websites and data management, and to help clients implement aspects of technology to pursue their passions and/or aid in their independence. It is from this perspective that I became interested in a career in coding. I’ve recently learned that what I most enjoyed about my time in academia — gathering, cleaning and manipulating data — is an entire actual career (whispers ‘Data Scientist’) where there is value placed on those responsibilities as opposed to what you write about your findings afterwards.

Realizing that people within tech actually value the steps that you took to arrive at your end goal has been a huge shift for me. This is mindblowingly different from academia. In fact, I’ve been dreading writing this very blog post for fear that I’d be discussing the wrong thing entirely. To mentally shift towards the importance and value within the process, as opposed to being silent until you’ve become somewhat of an expert on the end result, has been incredibly difficult for me.

Missteps: Tech Talk Error Messages

For example, during my first tech talk at Write/Speak/Code a month or so ago, I discussed the need for the project that I’m working towards during the ChiPy Mentorship Program (this post is a required component of the program).

Hand sketched map of Chicago, using pink dots (point data) to illustrate the number of schools in Chicago in 1990.
Figure: Spam and Eggs

My project idea is to create a historical view of public schooling in Chicago, using a series of interactive maps that provide the user with a look at the landscape of schooling in the city, as well as comprehensive data about the Chicago Public Schools from various perspectives. I want to turn a static map like this lovely sketch (see Figure ‘Spam and Eggs’) into interactive maps, which allow the user to select variables and manipulate the map themselves. During the talk, I went on at length about the who, what, when, where, and why, and said nearly nothing about the how. I even provided a background history to the city and it’s public education system.

It didn’t even occur to me to discuss ‘the how’ until the question and answer section at the end of the talk, when ALL of the questions were about ‘the how’. How did I get the data and would I share it? How did I store the data? How would I be mapping? How did I combine the census data with my data? These were irl error messages, indicating that I’d given an academic talk at a tech conference.

An Extremely Difficult, But Very Brief What

My data covers over twenty years of school change, and eventually could be used in various ways: initially to compare the demographics of public/neighborhood schools to their surrounding areas, and eventually to allow parents to compare schools and researchers to map the educational inequities in Chicago. My end goal is for the website to be a resource that allows parents to compare schools and researchers to map the educational inequities in Chicago. But for the scope of this project, I’m beginning by mapping school change. What does that mean exactly? In short, public/neighborhood schools have slowly been changing (read: disappearing) in Chicago. Some are repurposed and become charter schools, while others become empty buildings. There are…many implications. Luckily for you, based on my first tech talk, I now know that instead of telling you all about it here, I should provide a link or two to get you started, then move on to the how. It takes more self control than you can imagine, but I’m only going to provide one link, ok I lied, two links and one which doesn’t count because it’s a trick to get you to click on my research.

The How

During my first attempt at interactively mapping school change, I learned a bit about the process of map making with QGIS and Leaflet, and became way more comfortable using my terminal (thanks Jasmine from DataMade). It was a great experience and I learned so much, including that my six week timeline was a bit unrealistic for the project. So, this time, thanks to Chicago Python (ChiPy), I have access to a mentor for thirteen weeks, and we are approaching it a bit differently this time.

This time, the focus is on learning python first, and in addition to some of the books I already own (Think Python and Introducing Python), my mentor Spencer Chan (you can find Spencer’s project here) introduced me to this amazing site, Automate the Boring Stuff. Thus far, I have a pretty good understanding of python basics, functions, flow control and lists. The author even includes YouTube tutorials, and during week two of the ChiPy program, I wrote my very first Python program!

Working alongside a mentor, and having the ability to ask questions and get immediate feedback is so helpful, and really speeds up this process. For example, I was a bit confused about one aspect of the way Python evaluates True and False. It was difficult for me to understand the following code:

if True:
print("hi")
if False:
print("hi")

I understood why True printed “hi” but not why False failed to print “hi”. While speaking with Spencer, we worked out that I was thinking of it this way:

if False==False:
print("hi")

Which would in fact print “hi” because it’s True.

Instead, Spencer explained that the previous example

if False:
print("hi")

would not print “hi” because the if block evaluates whether a statement is True or False, and is instead asking something more along with this:

if 4>5:
print("hi")

Now it makes perfect sense.

After I get comfortable with Dictionaries, and Regular Expressions next week, the plan is to learn how to use Python to clean my data (I’ve been cleaning it thus far by hand using Excel). Next, we will use Numpy and Pandas to analyze the data, and finally I think I will dive right back into Leaflet and QGIS to begin making some maps with my shapefiles and joining my data. I love ArcGIS and I can’t say that I feel the same about QGIS. I read that you can make a choropleth map using Python. I’d love to try that, and as long as I can add point data (schools) to the choropleth of Chicago wards and their demographics, I should be well on my way. Hopefully, using Python will cut down the data cleaning time, so that I can have at least a few maps on a website by the close of the program in December. Everyone keep their fingers crossed for me. And if anyone knows of free mapping software that has the functionality of ArcGis please let me know!

Jay Yugen, PhD
·
6 min
·
3 cards

Read “Lessons from a Python Hatchling” on a larger screen, or in the Medium app!

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store