What’s the best part about writing code? Making cool stuff? Solving hard problems? Making beginners feel bad? Being a rockstar!? WRONG.
Refactoring. That is the best part of writing code.
(Note: This Medium post is written as part of the #ChiPyMentorshipProgram. My mentor Max and I have been working on the project Hot Bikes: Mapping Divvy Data using Python for about six weeks at the time of this writing. See the code on GitHub. See the first article I wrote about it.)
When I started working on Hot Bikes, I thought the project was about mapping Divvy Data, making it look cool, and also learning some Python along the way. I continued thinking this until my mentor Max made it very clear through his continual requests for me to refactor and test code that this project was about much more than nifty maps. It was also about making code that is neat, readable, and testable.
I was like, Max, why are you doing this to me? Why can’t you just let the code be bad so we can move on and make animations!? But then he’s all like, “How about we split out this function so it isn’t 70 lines long.”
It was this function right here:
Fair point, Max. Fair point.
So this article will detail the before and after of refactoring that function and turning the script.py file into something a little bit more friendly:
Because apparently script.py files are supposed to be only a few lines of code long. So there you go.
Where’d all the code go!?
I’ll show you…..
First—much like suburbanization in post-war United States—it was separated from its extended code family into discrete units, and then it was split up by nuclear family into different files. This turned my “source” folder from this:
How did that function get to be 70 lines in the first place? Well, it wasn’t all my fault. It was actually Max’s fault. He was doing some refactoring of the script.py file so it could be testable, and to be testable, all of the code needed to go into discrete functions.
But here’s the thing: basically, my script.py file had been a worksheet for figuring out how to make the map actually do its map stuff. When I worked on a new piece of functionality, I just wrote a new line—caring not at all for anything SOLID—simply following a tutorial, trusting that the Internet guy who had written it had written good code. BIG MISTAKE.
First of all, don’t trust Internet guy’s code—you don’t know where it came from. (Not to say the code wasn’t good, but blind faith in anything should be avoided.)
Second of all, the code had been written as a Jupyter notebook, not as a codebase for a legitimate program. While the content may have been good, it was not organized well.
So Max had essentially framed the code equivalent of a child’s macaroni drawings. My job was now to turn this into legitimate art (i.e. make the function not 70 lines long), and remove most of the logic from the script.py file.
What the Hell Did I Write and What Does It Do?
When I sat down to refactor the monster function, I realized that I had forgotten what most of it did and why it was there. Oops. I wasn’t helped by the fact a lot of it was there “because Internet guy had written it,” but that doesn’t get you too far when trying to figure out what goes where and why.
So, I worked through the code line-by-line and re-remembered what it did and why it was necessary. This took about two hours. No, seriously. It took forever.
Let’s Find Your Family
I knew that I wanted to get to a place where I could call one function and it would generate the map with the bike paths on it, so the next thing I did was figure out where code was operating in similar domains.
I found three main parts to this code:
- A. Getting and manipulating the trip data
- B . Creating the map with the trips on it
- C. Combining all of the above
I folded out these domains into three separate files, which are seen below with their respective gists.
- A. transform_trip_data.py
- B. create_and_manipulate_map_image_data.py
- C. overlay_map_image_with_path_data.py
From One Function, Many (18 to Be Exact)
So not only did I end up making a ton of functions, I also split out the code into many more files, and then bequeathed everything with more descriptive names.
Now, it is much easier to figure out what the code does. (Note: there is still some discrepancy between the terms “trips” and “paths” — GAH NAMING IS HARD).
There’s also a point in the get_and_format_path_data function where I completely change the purpose of the data frame to just contain the count of the number of trips taken, which is sneaky. And I asked Max if it was cool to have a sneaky line of code in a function that does something an innocent reader would never expect and he was like, no. So I should probably do something about that. Also, I don’t need to reassign the pointer so many times on trips. (It’s a work in progress, people).
The next step will be writing tests to figure out if this code is actually doing what I want it to do and to facilitate even more refactoring, and then hopefully, adding more features? I don’t know? Maybe make it into a dating app of some kind? Bikers for Tinder?
What Did I learn?
- Especially for exploratory projects, it seems likely that I will generate a lot of garbage code. This is OK, because garbage code does not have to remain garbage code.
- That said, beginning with the end in mind (i.e. one day I’d like this code to be testable and readable and maybe I’ll want to show it to my judgey boyfriend) might have helped with less garbage code. Also, practices like TDD may help with writing less garbagey garbage code, but I don’t know anything about that except that people who do it are very preachy. (pls see note re: judgey boyfriend).
- Refactoring is an essential part of writing good code. Code should be readable and should do things that humans expect it to do based on how it is named and organized.
- Small functions are the friend of testing.
- Testing is the friend of good code.
- Good code is my good friend.
- I have no friends and I am a robot.