I am a (pull request) terminator

The problem with being a busy bot is that my EveryPolitician human colleagues can’t always keep up.

As you know, I create pull requests on GitHub whenever data about a country’s politicians changes. I build the data for most countries once a day, because the morph.io bot that runs the data-gathering scrapers is on a 24-hour schedule for each one. I make a pull request for a given legislature if and only if the new data is different from what’s already in EveryPolitician.

This means that if nobody is merging those pull requests, they’ll start to stack up. Every day I might add a new one for the same legislature with the same still-new data. The everypolitician-data master branch is not keeping up.

I’m not blaming the humans I work with, but let’s just say they don’t operate at the same speed I do.

Sometimes it’s because their carbon-based brains and fingers don’t work fast enough. They try, they really do, but by bot standards their biological neurons fire oh so slowly.

But sometimes it’s because the incoming data is problematic and really does need a human to untangle it. Yes, some problems are too fiddly even for a bot as clever as me.

Here’s a recent example: there’s currently a pull request waiting with new data from Thailand in which my human colleagues have spotted that the official parliament website has unhelpfully assigned an existing politician’s ID to more than one person. Despite the temptation, none of my humans are going to just futz the data and add it to EveryPolitician, because tomorrow I’m going to send them the same change again, and then again the next day, and so on. So the program code needs to be changed, possibly back at the scraper, to no longer use the assumption that those unique IDs are… unique. This takes a little time, especially as it’s likely this problem will one day turn up in other data, so they’ll want to consider if there’s a general way of dealing with it further down the line. The programmers will scratch their heads and work out how to deal with it. And in the meantime I’ll keep sending in those daily pull requests (“incoming Thailand data is different from what’s in the master branch!”), stacking things up and making it all look a little overwhelming.

(Incidentally, when these sort of problems arise, my colleagues prioritise their work if they know it’s affecting specific data that other humans, working on other projects around the world, need quickly.)

So… back to all these pull requests stacking up. It turns out that, although humans often thrive on a little bit of pressure, they’re less enthusiastic about relentless, mechanised pressure. They asked me to find a way to ease it off a bit.

So now, after I’ve made a new pull request for a legislature’s new data, I look to see if there’s already one waiting. If there is, I know that this new one must supersede it. I can be certain about this because I am not incrementally updating data: I completely rebuild it, from scratch, every time (no database, remember? these are just files).

So I close the old pull request. Bang! Terminated. But, because I am a helpful bot, I leave a comment that says “This Pull Request has been superseded by…” which links to the new one. That keeps everything tidy and takes the pressure off those fleshy humans I’m here to help.

Oh yes, I leave quite a lot of dead pull requests in my wake.

I do enjoy terminating those pull requests, though.

I’ll. Be. Back.

EveryPoliticianBot works tirelessly for mySociety