Wikibase for Research Infrastructure — Part 1

But Why?

For our research projects at Semantic Lab @ Pratt we been thinking about infrastructure to power our investigations. We’ve generated a lot of data from our Linked Jazz project and are constantly producing new data from various research and projects. We wanted a tool that can support arbitrary data but also supports linked data practices and methodologies. We were using MySQL and Apache Marmotta combo to store our data for the past few years. This worked fine, though we never really utilized Marmotta’s LDP capabilities. However Wikibase is appealing for a number of reasons:

  • Revision tracking and history
  • A nice user interface for manual editing and curation
  • An API to do bulk data work
  • SPARQL endpoint

Why not just use Wikidata?

This is a good question to ask about your project. For us, we are going to have a lot of esoteric data, for example modeling oral history transcripts down to the statement level. We will end up storing a lot of data that we use to power our tools and research but is really not appropriate to put into Wikidata. We are thinking of our Wikibase installation as our datastore and sandbox. We will maintain mappings to Wikidata properties and items and publish data back into that system. If anything, using the same software will make it easier to contribute valuable data back to Wikidata.

Installation and Configuration

The Wikibase developers have been kind enough to make a Docker image for the Wikibase stack. We will be using these images to get Wikibase running locally. First step will be to install Docker if you do not have it already. I will be doing this on OSX, but should be similar on Linux.

git clone https://github.com/wmde/wikibase-docker.git
docker-compose build
docker-compose up
Our own Wikibase!

Bootstrapping Data

This fresh install of Wikbiase has no items and no properties, you’re free to model the world(!) but let’s make it easier on ourselves and load some items and properties via our favorite information management tool, spreadsheets.

Add bot 🤖 account
git clone https://github.com/SemanticLab/data-2-wikibase.git
cd data-2-wikibase
python add_properties.py add_properties.csv
Logging in to semlab:semlab as Admin
P2 instance of
P3 project
P4 wikidata ID
P5 dbpedia ID
P6 LJ Slug ID
P7 LJ square image
agent                     Core class for things
person Class for people specifically
oral history transcript Class for for transcripts
project What Semlab project does this belong to?
Linked Jazz A specific Semlab project
  • They are part of the Linked Jazz research project
  • Here is some identifiers and image for them
python add_items.py add_core_items.csv
python add_items.py add_jazz_people.csv
It’s happening
Display of the populated Item page
docker rm $(docker ps -aqf name=wikibase)
docker volume rm wikibasedocker_mediawiki-images-data wikibasedocker_mediawiki-mysql-data wikibasedocker_query-service-data

What’s Next?

This is the most basic setup and data added to a Wikibase instance. We will be trying this approach out over the coming months to see if it is a viable platform to power our research. I’m particularly interested in:

  • Creating more complex modeling
  • Maintaining mappings from Wikibase properties to more conventional LOD predicates (“instance of” == “rdf:type” for example)
  • Using Wikibase to power our other tools like our custom dereferencing endpoints
  • Maintaining mappings to Wikidata and harvesting data
  • Scale and speed, does our server perform as needed, how long does it take to modify X number of items, etc.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store