Mapping Wikidata to Bibframe

How is bibliographic data stored on Wikidata? How does that compare to an emerging standards like Bibframe? I wanted to do a quick survey and comparison. Some ground rules:

  1. Using the Library of Congress Bibframe 2.0 spec
  2. Only looking at Work level metadata
  3. Only looking at monographic-ish data (not serials, music, media, etc)

There are a little over 100,000 books currently on Wikidata. First step is to get a list of them using their SPARQL endpoint:

SELECT ?item WHERE {
?item p:P31/ps:P31/wdt:P279* wd:Q571.
}

This simple query asks for the ID for any thing that has instance of or any subclass of of Book. This query should find anything that is a book or type of book like a cookbook, or children’s book etc.

I then used a simple script to pull all the properties for these 100K books and tally their counts. The results:

Top properties used for Books on Wikidata

You can view the full list of 311 properties used here. There are few properties that are well represented and then a very long tail. Some of these might seem surprising on the surface, like “P1476 - title” only appearing on 11% of records, but remember each entity has a “label” value which hopefully holds the title (in multiple languages).

I then organized them into my own groupings based the nature of the property. Right away there are problems of properties that describe what would be called Work in Bibframe alongside properties that would be used in an Instance, such as publisher, page count, etc. I diagramed out these Bibframe predicates last year with a focus on what predicate works with which class.

I was curious given the palette of properties already in use by Wikidata how many of Bibframe Work predicates could be considered equivalent? You theoretically have unlimited properties available from Wikidata to use, but I wanted to limit to only ones currently used to talk about books.

So I made a list of all possible Work related Bibframe predicates and started mapping possible Wikidata properties that could work.

This was not an exhaustive mapping by any means. But not surprisingly a lot of the properties I had real problems with were mainly focused around detailed bibliographic data. That doesn’t mean there is not the perfect Wikidata property for everything, just seem to be missing/not used from the properties currently in use for books.

For non-books there are equivalent classes in Wikidata, for example thesis or maps. So this process could be repeated for those Work subtypes.

It is a thought experiment to think how you would represent Bibframe classes in Wikidata. But at least at the Work level it would seem possible to represent many of the Bibframe predicates.