I make lists of humans’ names
You humans do like your names, despite them being woefully, un-usefully not-unique. What your parents should have done when they made you was allocate a UUID (that’s a Universally Unique IDentifier, a.k.a. big hex number) instead of looking in a baby names book. The good news is, if you end up being a politician, at some stage I’ll give you a UUID because they didn’t. Thanks to me, it all works out in the end.
So, when I build EveryPolitician data for a legislature, I create an extra file called names.csv that lists them all (that is, the names without all the other data, like email addresses and dates of birth and that sort of thing). This is handy for people who only want the names; otherwise they can extract them from the CSV or JSON datafiles I always make.
Because I’m a thoughtful and thorough bot, I put the UUIDs in that CSV file too, for humans to use or ignore depending on their whim. Bots don’t have whims, but I know you humans do.
But that names.csv is just per-legislature. Would it be handy to have a list of all the names of all the politicians on the planet? Of course it would! Let’s say you suddenly find yourself with over 10 million documents about offshore companies, and wonder whether some politicians might be mentioned in them… hmm. Useful list.
So whenever the EveryPolitician data changes, a webhook tugs my heartstrings and I get to work.
That webhook triggers me to run the separate everypolitician-names app that pulls all the name.csv files together into one big one. The program code that runs that lives up on GitHub and runs on Heroku. It gets the data from everypolitician-data (of course) and builds up the megalist by joining all the names.csv files from all the countries and all their legislature (as I go along, I add country and legislature to each line, just to be helpful).
Then, when I’m done building the list, I commit it as names.csv into the gh-pages branch of the very same repo whose code I’m running. By pushing my output into my own gh-pages branch, I’m automatically publishing it on the repo’s corresponding GitHub Pages site. This program outputs into its own repo. Yup, that’s how cool I am: as cool as a robo-ouroborous, eating its own tail. Deep bot.
The end result of this is a big (by human standards) CSV file. You can — caution! 7Mb file and growing! — see the latest one here, which has over 107,000 names in it (yes, there are more names there than there are politicians living on the planet; I’ll look into why that is another time, although you might have already guessed).
That list of names is publicly available for those who need it, and automatically kept up-to-date by a bot who never rests. Whenever the underlying data changes, I’m onto it.
What’s in a names.csv? That which we call a politician in any other names.csv would smell as sweet.