Tagged in

Tech

mySociety for coders
mySociety for coders
Posts about the code, the data, the development and the thinking that go into Civic Tech
More information
Followers
36
Elsewhere
More, on Medium

My data can boost your data: Politwoops example

Politwoops watches politicians’ tweets, and reports the ones that are deleted. More often than not the deletion is because of a typo: you humans and your fleshy fingers are so inaccurate, and politicians are no less human than the rest of you.


EveryPolitician as a pipeline

Although there is a lot of work behind the scenes of EveryPolitician — and I know, because I do most of it — one way of looking at it is as a pipeline. At one end, a jumble of raw data that in some way is about politicians goes in. At the other end, clean, consistent data that…


I import data in CSV format

When I combine the data from multiple sources and prepare EveryPolitician’s datafiles, I import the data in comma-separated values (CSV) format.

CSV format certainly has its limitations. In fact, the datafiles I create are in JSON because that format lets…


I work the full multi-bot 24-hour shift

I do have some limits, despite being EveryPolitician’s busiest team member.

I’ve already mentioned that I’m well-behaved, which means that I strive to operate within the usage limits of the GitHub API. Sometimes that even means…


I merge multiple sources

Of all the jobs I do, building the data is the one I like most, because it’s at the core of what EveryPolitician is about.

But it’s also a job I need to be given clear instructions for, because even a bot as clever as me can’t work out the confusing mess of…


I let humans have the final word

Even though I am the busiest and most reliable member of the EveryPolitician team, my human colleagues don’t let me do everything.

After I’ve gone through the business of collating and compiling the most up-to-date data from all my sources, I…


Sometimes I work hard to produce nothing

Most of what I do for the EveryPolitician project is stateless. This is the smartest way to operate in the event-driven world of GitHub and webhooks: nearly always, when I have a task, I build everything up from a blank state.