Writing a Wikidata Query: Discovering Women Writers from North Africa

Recently I attended the WikiIndaba Conference 2018 Conference in Tunis, where I helped lead a workshop with User:Helmoony to teach participants how to contribute to, and take advantage of Wikidata. I found the workshop to be very rewarding: several folks afterwards approached me, and said that the workshop helped them feel comfortable seeing Wikidata in their own work.

As part of developing the workshop, I wrote this script, as a dress rehearsal to make sure I knew how I would teach the Wikidata query. I thought I would share its here, because teaching the query service can be challenging — and learning the process of developing a Wikidata query can be hard. (I also created a (very drafty) slide deck that I didn’t end up using, just in case I had to teach the query service offline.)

But before the script, some thoughts…

If you follow the script for this Query service lesson, you should be able to create an interactive map of North African Women Writers like this!

A couple reflections on teaching Wikidata Queries

When teaching the Wikidata Query service, I have noticed a couple things:

  • Don’t assume any prior knowledge about data/database, Wikidata/Wikimedia, SPARQL, structured information, or terminology. I have given workshops to both librarians and Wikimedians, where I thought some fundamental assumptions about data or Wikimedia would be share knowledge, but several folks didn’t know about it.
  • It’s important to describe the logic of your actions in front of the audience. There are a number of steps involved in writing queries that are non-intuitive to folks with little or no programming background.
  • It’s important to make a couple mistakes in front of the audience: if the audience doesn’t participate in your process for identifying mistakes in a query, they will have a hard time diagnosing and correcting their own problems when writing queries. This means: relax, teach the query, but also anticipate not getting it right at every stage.
  • It’s important to repeatedly describe steps that are going to happen a lot when writing queries (i.e. adding new lines to the query, using the ctrl+space shortcut, and remembering to add variables to the SELECT line.)

Also, if you are planning to teach the Query Service, I created a one-page handout/cheatsheet to help folks find simple explanations for the code that ends up in front of them. It doesn’t teach how to write a query, but should make it a lot easier for folks to interpret existing queries and remember what things they need to add into the query. It also reduces the number of questions from folks about the syntax of the query during the workshop.

Since it was Woman’s History month during WikiIndaba, I choose to write a query asking about women in the Tunisian context. In part, I choose to do a query about people, because the data structure for people, is fairly consistent within Wikidata. For some classes of item, or domains of knowledge, Wikidata is just beginning to develop data models — its important to pick a more established topic. When learning about the Wikidata Query service it’s important to ask questions that have a wide range of answers, and be able to experiment with modifying the query; therefore it’s important to choose a more established model (like people, monuments or paintings), so that, as a teacher/learner, you can predictably iterate on the query and show how adding new lines to the query can change the results.

So without further ado, the script:

Writing a Query

Step 1

So a Wikidata Query is a question you ask of the Wikidata database. To ask a question in the query service, you start with a call to the software signalling that you want to ask such a question. Here is that signal:

SELECT 
WHERE {
}

Following the Wikidata data model, you tell the computer to ask questions using the triple format (?subject ?property ?object). The first triple I am going to add, is the nationality for a person: since we are in Tunis, I am going to ask for someone with Tunisian nationality. To do this, I need to figure what the appropriate property is for determining someone’s nationality, so I am going to check a Wikidata item that models the data I want to discover.

To discover the correct property and item, I start by searching for “Tunisian musicians” on Wikipedia. I then chose one of the articles in that category, find the Wikidata item through the link in the sidebar for Wikipedia, and then look for what property describing nationality is “country of citizenship”. By clicking on that property and it’s object, I can learn that the property is P27 and that Tunisia is Q948. To add a property in the Query service you use a prefix wdt: , and to add an item you use wd: . Thus I am going to add a question about a variable ?item in that triple format:

SELECT ?item
WHERE {
?item wdt:P27 wd:Q948.
}

See the results

As you can see I am asking for the variable “?item” to have a statement which includes property P27 and object P948 — basically filtering the whole database for ?items where that statement is true. Notice also, I had to add the variable ?item after SELECT, in order to return that in the output. You can press run on the Query service, to see the results.

Step 2

And since it’s women’s history month, I also want to make sure that I get women in the result. So I am going to add two properties: “instance of” with the object “human” make sure that I get real humans, and “gender” with the object of “female”. I don’t want to have to check a Wikidata item every time that I want to add a property. The Query service has a quick shortcut for doing this: ctrl + space gives you a search option.

So I am going to search for those two properties, by first writing ?item than adding wdt: and using the search shortcut (ctrl + space), and then repeat the search shortcut after wd: . Then I will repeat this again for the second property. The result will be something like this:

SELECT ?item
WHERE {
?item wdt:P27 wd:Q948.
?item wdt:P21 wd:Q6581072.
?item wdt:P31 wd:Q5.
}

See the results

Using the ctrl+space shortcut after wdt: or wd: gives a human readable search box, that can help you find the appropriate property or results

Step 3

Now lets press run. Notice how you get a list of Wikidata items for the variable ?items. That’s super useful for computers to read the output, but isn’t very useful for humans. So we are going to call on a special software service that helps humans read the labels.

Without human readable results, this query probably isn’t very useful in a lot of contexts.

Lets start with our favourite shortcut in the tool: ctrl + space and type “label” and add the label service, a line of code that starts with “SERVICE wikibase:label…”. This will result in something like:

SELECT ?item
WHERE {
?item wdt:P27 wd:Q948.
?item wdt:P21 wd:Q6581072.
?item wdt:P31 wd:Q5.
SERVICE wikibase:label { bd:serviceParam wikibase:language “[AUTO_LANGUAGE],en”. }
}

See the results

To tell the software to return a label in the results, I am going to add two different elements: first a camelcase variable in the SELECT field to call on a label with the item (?itemLabel), and second, because I am in Tunis, and want the labels to be easy to read in a Tunisian context, I am going to modify the labels to include languages spoken in Tunisia: French (fr) and Arabic (ar) before English (en):

SELECT ?item ?itemLabel
WHERE {
?item wdt:P27 wd:Q948.
?item wdt:P21 wd:Q6581072.
?item wdt:P31 wd:Q5.
SERVICE wikibase:label { bd:serviceParam wikibase:language “fr, ar ,en.” }
}

See the result

Press run! Wow a human readable list! That’s a great list! But it’s only a start.

Step 4

Now I want to know more about these writers: let’s try to find their birth date and location. Using the same ctrl + space technique, I add these two filtering statements in the WHERE part of the query with new variables ?placeofbirth and ?dob as objects. Instead of narrowing the results like we did with the last two properties, I am asking for the tool to give me additional variables in the output. To display these properties, I am going to add two variables to both the SELECT . You will have something like:

SELECT ?item ?itemLabel ?placeofbirthLabel ?dob
WHERE {
?item wdt:P27 wd:Q948.
?item wdt:P21 wd:Q6581072.
?item wdt:P31 wd:Q5.
?item wdt:P19 ?placeofbirth.
?item wdt:P569 ?dob.
SERVICE wikibase:label { bd:serviceParam wikibase:language “fr, ar ,en”. }
}

See the result

If you run the query, all of a sudden it’s a lot smaller: the software thinks that you only want results that includes something for each of those new variables. However, just because we don’t have the contextual information for these folks, doesn’t mean they don’t belong in my query — after all I want a list of all Tunisian women. Therefore I am going make these OPTIONAL, wrapping each triple in curly brackets:

SELECT ?item ?itemLabel ?placeofbirthLabel ?dob
WHERE {
?item wdt:P27 wd:Q948.
?item wdt:P21 wd:Q6581072.
?item wdt:P31 wd:Q5.
OPTIONAL {?item wdt:P19 ?placeofbirth. }
OPTIONAL {?item wdt:P569 ?dob.}
SERVICE wikibase:label { bd:serviceParam wikibase:language “fr, ar ,en”. }
}

See the result.

Press run: now you can sort the table via each of the variables, allowing you to organize these women from different perspectives . Also, you can identify gaps which of these women’s items are missing the date of birth or place of birth. Filling these gaps would be a great activity for an editathon: researching these women to make their items more complete.

On the left, you can see the eye icon, which provides you the opportunity to change the display. In the results, you can see gaps in the data: great opportunity for an editathon!

Also, we now have a chance to change the display of this data. If you go to the eye below the run button, you will find a number of different ways to display the data. Among these options is “timeline”, if you choose it, the software will automatically generate a timeline appropriate to the query based on the variables which return dates.

Step 5

But I want to do something more with that data: I want a map. To create a map, I am going to start asking questions and filtering the data not about the “?item” but about one of the other variables introduced in the query: I will ask for the coordinates (“?coord”) for the “?placeofbirth”. To do this, I am going to add another filter within OPTIONAL {?item wdt:P19 ?placeofbirth. }, which asks for the coordinates for those locations. I use the ctrl+space search to find that other variable:

SELECT ?item ?itemLabel ?placeofbirthLabel ?dob ?coord
WHERE {
?item wdt:P27 wd:Q948.
?item wdt:P21 wd:Q6581072.
?item wdt:P31 wd:Q5.
OPTIONAL {?item wdt:P19 ?placeofbirth. 
 ?placeofbirth wdt:P625 ?coord.}
OPTIONAL {?item wdt:P569 ?dob.}
SERVICE wikibase:label { bd:serviceParam wikibase:language “fr, ar ,en”. }
}

See the result

I can now choose the map display. But this is a too full map! And if you explore it: almost every red dot has many people born there. I am going to add another variable to make this a smaller list: asking for everyone to have the occupation writer. Moreover, I want to see the map display first when I run the query, so I am going to add the special comment #defaultview: at the end of the query.

SELECT ?item ?itemLabel ?placeofbirthLabel ?dob
WHERE {
?item wdt:P27 wd:Q948.
?item wdt:P21 wd:Q6581072.
?item wdt:P31 wd:Q5.
?item wdt:P106 wd:Q36180
OPTIONAL {?item wdt:P19 ?placeofbirth. 
 ?placeofbirth wdt:P625 ?coord.}
OPTIONAL {?item wdt:P569 ?dob.}
SERVICE wikibase:label { bd:serviceParam wikibase:language “fr, ar ,en”. }
}
#defaultView:Map

See the results

Step 6

However, for over 600 women from Tunisia, only having a few dozen of them as writers: it seems like the query isn’t quite right! Maybe that filter narrowed the grouping too much. To diagnose this, I go and look at those expected results pages again. I discover that writers who work on a more specific genre of writing (say poetry, or journalism, or writing as an academic) don’t have profession as “writer”, but some narrower, more specific profession. Thus I need to make the query more inclusive, by including all “subclasses” of writer in the profession filter. There could be two ways to do this: I can turn occupation into another variable by adding another line like I did with the coordinates earlier. For example, see :

SELECT ?item ?itemLabel ?placeofbirth ?coord ?dob 
WHERE {
?item wdt:P27 wd:Q948.
?item wdt:P21 wd:Q6581072.
?item wdt:P31 wd:Q5.
?item wdt:P106 ?occupation.
?occupation wdt:P279 wd:Q36180.
OPTIONAL {?item wdt:P19 ?placeofbirth.
?placeofbirth wdt:P625 ?coord }
OPTIONAL {?item wdt:P569 ?dob.}
SERVICE wikibase:label { bd:serviceParam wikibase:language “fr, ar ,en”. }
}
#defaultView:Map

See the result

But that is a kindof complicated . Instead, I am going to combine the two properties in a line with a / :

SELECT ?item ?itemLabel ?placeofbirth ?coord ?dob 
WHERE {
?item wdt:P27 wd:Q948.
?item wdt:P21 wd:Q6581072.
?item wdt:P31 wd:Q5.
?item wdt:P106/wdt:P279 wd:Q36180.
OPTIONAL {?item wdt:P19 ?placeofbirth.
?placeofbirth wdt:P625 ?coord }
OPTIONAL {?item wdt:P569 ?dob.}
SERVICE wikibase:label { bd:serviceParam wikibase:language “fr, ar ,en”. }
}
#defaultView:Map

See the result

But those queries only give me the results if folks have an occupation that is a direct subclass of writer. What if the class is several subclasses down the hierarchy of items? This is where you add an *: which tells the software, keep going down that chain of logic, and include all subsequent subclasses of each of the subclasses of writer:

SELECT ?item ?itemLabel ?placeofbirth ?coord ?dob 
WHERE {
?item wdt:P27 wd:Q948.
?item wdt:P21 wd:Q6581072.
?item wdt:P31 wd:Q5.
?item wdt:P106/wdt:P279* wd:Q36180.
OPTIONAL {?item wdt:P19 ?placeofbirth.
?placeofbirth wdt:P625 ?coord }
OPTIONAL {?item wdt:P569 ?dob.}
SERVICE wikibase:label { bd:serviceParam wikibase:language “fr, ar ,en”. }
}
#defaultView:Map

See the result

Press run! Notice how you have many more results! Wow that’s really a cool map!

There is one problem, when you look at the results as a table: the query is providing a number of lines for each writer, if they have multiple writing professions. I am going to add the special concept DISTINCT after SELECT to make sure that result lines don’t repeat. I also want the map to be a bit more visually engaging, so I am going to add a property for the author’s image, in an OPTIONAL field, so that each red dot on the map has an image of the author pop up if its available. The query will look something like:

SELECT DISTINCT ?item ?itemLabel ?image ?placeofbirth ?coord ?dob 
WHERE {
?item wdt:P27 wd:Q948.
?item wdt:P21 wd:Q6581072.
?item wdt:P31 wd:Q5.
?item wdt:P106/wdt:P279* wd:Q36180.
OPTIONAL {?item wdt:P19 ?placeofbirth.
?placeofbirth wdt:P625 ?coord. }
OPTIONAL {?item wdt:P18 ?image.}
OPTIONAL {?item wdt:P569 ?dob.}
SERVICE wikibase:label { bd:serviceParam wikibase:language “fr, ar ,en”. }
}
#defaultView:Map

See the result

If you press run, you can find all the writers with Tunisian Nationality! Very cool!

The results once I add the images, and DISTINCT to the query. Notice how engaging the map is! Pop up bubbles with images!

Step 7

Now that I have a basic query set up, I want to ask a slightly larger question what about the rest of North Africa. I do this by modifying the nationality line of the query, so that there is a variable ?nationality, which is part of North Africa via the property P527 (has part). Here is the query:

SELECT DISTINCT ?item ?itemLabel ?image ?placeofbirth ?coord ?dob 
WHERE {
?item wdt:P27 ?nationality.
wd:Q27381 wdt:P527 ?nationality . 
?item wdt:P21 wd:Q6581072.
?item wdt:P31 wd:Q5.
?item wdt:P106/wdt:P279* wd:Q36180.
OPTIONAL {?item wdt:P19 ?placeofbirth.
?placeofbirth wdt:P625 ?coord. }
OPTIONAL {?item wdt:P18 ?image.}
OPTIONAL {?item wdt:P569 ?dob.}
SERVICE wikibase:label { bd:serviceParam wikibase:language “fr, ar ,en”. }
}
#defaultView:Map

See the result

If you press run again, it creates a very full map! That’s a list of hundreds of women writers from North Africa! Very cool! Now by making small changes to components like which nationality, region, or profession in the query, we can change which region you map the birthplaces of women or which professions that you want to explore.

The final iteration of the map, with every Woman writer whose nationality is from a country in North Africa! Looks like we are probably missing some people!

Activity

Want to test your ability to write a query? Write your own!

Here are some questions that you should be able to ask by modifying the above query:

  • Which of these women writers were born after 1985 that are in Wikidata?
  • What novels were written by these women that are in Wikidata?
  • Who are all of the women singers from North Africa that are in Wikidata?
  • Who are the women politicians from East Asia that are in Wikidata?

Or, try writing a new query:

  • When and in which countries, were buildings built that had female architects?
  • Which female artists created paintings that depict women politicians? When were those paintings created?

Or create your own question!

Remember, answerable questions need to have related data within Wikidata and be broken out into components of the question that can be described with a triple. If you want to ask more complicated questions, consider using an existing query from the examples listed on Wikidata, and modifying it to do what you want.

Finding help

There are several help channels available for folks wanting to learn about queries, including:

What next?

Writing a query is just the beginning of different ways you can use the query to arrange or organize knowledge in Wikimedia projects. My colleague Sandra Fauconnier recently published a blog post describing various actions you can take with this data on the Wikimedia blog: https://blog.wikimedia.org/2018/03/29/increasing-visibility-women-with-wikidata/