How my code forced student-workers to drive for hours every day
I had been writing code for over 7 years, but it had never been used to make Decisions-About-The-Real-World. This is my anecdote of how a little bit of foresight and a few keystrokes could have spared someone I’d never met a crappy summer.
It’s also a reminder that, for better or for worse, the tiny decisions that a programmer makes can have far-reaching consequences.
St. Olaf college takes great pride in its interdisciplinary approach to everything, with it being a liberal arts school and all. So when the Alumni Relations Office (who I will henceforth refer to as the alum people) had some logistical problems, they came over to the Computer Science Department ( the cs people) to see if we could build something for them.
The plan was to send a group of 15 students to interview over 150 alumni each over the course of the summer, alongside their summer jobs. That’s over 2000 alumni and at least 2 interviews per day for 3 months if you don’t count weekends. That’s a lot to keep track of.
So we built a little web app where students could keep track of who they were assigned to interview, when, what they said, any expenses they spent, etc.. And the alum people could see all that information in real time.
This was all nice and dandy, and we even gave them a way to batch upload who was assigned to interview whom and handed it off to them.
Now out of sheer curiosity, I ask them, how do you decide who gets assigned to whom?
“Oh, don’t worry about that! We’re just going to go through our list of 10,000 alumni over the next few weeks and figure out who lives closest to whom.” — The Alum People
The irony of doing manual labor on a computer was much too pronounced. And as any sane programmer would do, I couldn’t let it go.
I needed to find the closest 150 to each one of 15 students out of the 10k addresses. Now this is a very simple problem. The scale of the dataset didn’t even warrant any fancy optimization; a simple loop over everything would do just fine.
The only problem was converting all those addresses to longitude and latitude points. No geocoding service would do this for free without choking me with rate-limits. I ended up stumbling on Texas A&M’s Geocoder which would do this for almost nothing for academic institutions. Yay for academia!
Here’s roughly what my logic was for calculating this:
#Go through the students one by one
for student in studentArray:
#Does this student have 150 assignments yet?
if(student.assignments < 150):
# Loop over 10k alums minus already assigned
# Find closest one to student
# Assign and remove from the alumni list
student.assignments += 1
Do you see the problem? It’s right there!
Let’s assume the list of students is ordered alphabetically. Now what happens when Amanda Armstrong happens to live really close to Zachary Zimmerman? Or worse, in the same building? Amanda would end up getting most of her assignments within walking distance, and Zach would have to drive all over the city, or beyond, every day.
That may sound silly, but alliterations aside, that’s exactly what happened.
Feeling the Guilt
I did see this coming, theoretically at least. It just didn’t seem like a very likely outcome at the time. With 10k alumni, surely there was enough to go around. They were just a lot more spread out over and around the city than I had anticipated.
I could have spent a few more minutes analyzing the results before I sent it off. Or better yet, I could have visualized the assignments and would have immediately seen the anomaly. But I didn’t. And that was my fault.
I would like to say I’m sorry to everyone on the bottom of that list. Perhaps none of whom even know I exist.
On the bright side, at least my bug didn’t almost cause world war 3..