Wikidata’s content: What data on politicians already exists for any country ?

On a regular basis, I make contact with people to talk about collecting and using data on politicians. As part of our conversation, I normally ask something along the lines of: ‘ What do you think about using Wikidata for political data in XX country?’

Themes are appearing in response to this question, they include:

  • No response (come on people!)
  • What is Wikidata?
  • We already have all the data we need, and people are using it… what would be the benefit?
  • Hmmm vandalism?
  • Wikidata has a high entry level
  • To do that would be a lot of work
  • And finally… what data on politicians already exists?

And so, the mySociety Democracy team and I have been trying to understand how we could find answers to this final question, and guess what, its complex.

If you want to skip and find out what we built to solve this skip to the end of this post for details on the new ‘Commons Explorer’ tool.

Using SPARQL queries and platforms to visualise data
When I met Dr Martin Poulter, Wikimedian- in- residence at the Bodleian Library (and also the presenter of this Ted-X talk which will help to answer ‘What is Wikidata’), one of many takeaways from the meeting were his tips on engaging people around Wikidata.

Martin suggested to always try and include an impressive SPARQL query, or a data visualisation platform (examples relevant to his field were Crotos and histropedia) that will help people to see what data does and doesn't exist on Wikidata, in the hope it sparks an action. Yipeee! I thought, that’s an easy first step: help people visualise the existing political data in Wikidata

Stuff to consider before you can ask for a tool to visualise political data (Poland as an example)
Colleagues, I said enthusiastically and naively, I need to run a ‘health check’ on what data on currently elected politicians exists for Poland. And yes, I have already checked:
- Legislative Explorer (a tool mySociety built to interpret what model Wikidata has in regards to 145 country’s legislatures) but it doesn’t tell me who is/ isn’t currently holding positions in the model 
- Wikidata project every politician (the Wikidata project page for the project mySociety ran with the Wikimedia Foundation to support the Wiki community to work on the structure and quality of data on elected politicians), but the reports I ran are a) only for national level and b)are not returning anything.

What I was hoping for, was a Query that could produce a table that showed a list of all the positions politicians could hold in Poland, with either blanks or entries detailing info on the person who holds that position…

Unsurprisingly, there were a few issues preventing my ideal query that would produce the dream table:
1) There is a lot of nuance to building a query: you need to be very precise on what you mean by being a “current elected Politician” as there a variety of ways that can be expressed, and depending on how you phrase the question you’re likely to get often quite wildly different answers, which is why we are addressing consistency in the data
2) Often the items of the position a person could hold (e.g Marshall of the Sejm) haven’t been created, so a query on finding who even holds this position won’t exist… again this is something the Wiki project is addressing.
3) Wikidata can have quite a lot of information about the politicians, just largely not the fact that they currently hold that position (which is, in many ways, the key problem that we’re addressing with the Wikiproject and verification tool)
4) You can’t ask the question of what the data should look like at a subnational level for Poland, without some degree of investigation. When using one of our ‘standard’ queries ( which generally expects data to follow patterns established in the every politician Wikiproject data model) there aren’t any legislatures below national level, but maybe that’s because there aren’t any at the First-level Administrative Country Subdivision level (as in Costa Rica) *this is a demonstration of the complexity of modelling political systems*
5) Even at a national level, where has already done the work of reconciling all the current members (and so we can be pretty sure that they’re in Wikidata), its unlikely they will have a start term or start date, or importantly an end date, so a query couldn’t tell you who is current.

In summary, because the data currently isn’t consistent (which is what we are trying to address) basic queries can only tell you what data does exist, and unless there is a sense of what data should exist, it’s hard to have any sense of completeness of the data.

Oh my gosh! mySociety staffed whipped up a table tool!
Probably because they were exhausted with me asking the same question, this week my clever colleagues whipped up the ‘Commons Explorer’. Is this tool, the table tool of my dreams? Almost…

The commons explorer tool uses SPARQL queries to assess what data exists on Wikidata for currently elected politicians, according to the model that should exist for that country (to the First-level Administrative Country Subdivision, level).

The table displays the name of the person who is listed as holding the relevant position, within the current term and if it exists, the party they belong to and their Facebook social link. So a person in the know, would be able to say who is missing from the list, or you could compare the list to an external, complete list. If someone is not on the list, the fix needed on Wikidata could be

  • Adding a P39 statement to their Wikidata ID
  • Adding a start term to their P39 statement
  • Creating them as a person on Wikidata (and then adding the 2 above)

The tool is in its early stages, but it is a really positive step and the team plan to improve it. Some of the changes they are aware of are:

  • It doesn’t yet show constituency information for members of legislatures
  • It will show the number of seats in a legislature, and a “percentage complete” metric
  • We are considering expanding it to show historical information
  • It’s a bit slow to load pages at the moment, and we’ll see how we can improve this
  • We could expand it to show more information about the people involved (e.g. pictures, gender, dates of birth, contact details)
  • We’d like to expand it to more easily show where the gaps are, for other people to fill in

Nice one team!

Now, back to finding more people who want to know about the state of the political data on Wikidata for their country…