Question Answering over Wikidata
After 3 years of research, I would like to share with the Wikidata community the results achieved during my PhD thesis. Wikidata is a great project and I’m still astonished about it. I often find information that I would have never believed to be contained in Wikidata. Moreover, it is really up to date. I want to thank all the community for the great work you do and I hope I can also contribute a bit to it. So let me describe what I did in the last 3 years and why I think it can help the community. If you want to try it out directly before/while reading this blog entry go to www.wdaqua.eu/qa.
What was the aim?
As a Wikidata editor, you have inserted facts here and there. I do it myself and it is just cool! Sharing our knowledge with other people and construct all together a structured model of the world — what a great idea.
But have you never feel that it would be even cooler if everyone could profit from it? I mean everyone including persons who do not know what an entity, a statement, a quantifier is.
Exactly this was the aim of my PhD, making the data in Wikidata accessible through natural language. An example? Imagine someone asks: “What is the capital of Brazil?” (yes, there are people who do not know that, and yes there are people interested in an answer). You know that this information is available in Wikidata, but how to find it and display it to the user?
I tried to answer to exactly this question in the last 3 years. I do not have the perfect solution, but I have a solution. Here is how it looks like:
The research field that tries to address this question is called Question Answering.
Why this is so important? The Wikidata community creates each day a lot of valuable information, it keeps it up to date and it maintains it. Making it accessible to end-users is in my eyes a very important goal.
What questions can be answered?
This is not an simple question. Mostly this depends on what knowledge is encoded in Wikidata. So first important restriction: if the information is not contained in Wikidata, we are not able to answer the corresponding question.
But this does not mean the contrary, i.e. that we can answer to each question whose answer is contained in Wikidata. Why? Because it is not a simple task!
We can answer to questions that correspond to only one statement in Wikidata, for example: “Who is the wife of Barack Obama?”. Important here is that the label of the relation contains “wife” and the label of the entity is “Barack Obama”.
We can also combine multiple statements to answer to questions like: “What is the party of the mayor of Berlin?” or “Give me museums in Lyon.” If too many statements need to be combined to find an answer, we will probably fail.
We treat both natural language questions like: “What is the formula of water?” and keyword questions like: “water formula”. So formulate your question as you prefer.
Moreover, we support multiple languages. Wikidata is also great for offering the knowledge in multiple languages. Currently we support: “English”, “German”, “French”, “Italian”, “Spanish”, “Portuguese” and “Chinese”. If you would like to implement a specific language let me know!
How can it be helpful for the community?
Besides the fact that Question Answering makes Wikidata knowledge easier accessible, I think Question Answering can be a useful tool when you access and edit Wikidata. I think there are mainly two cases.
The first is about data incompleteness. Imagine you search just for: “Louvre”. You will see directly some important information about the Louvre like external links (homepage, twitter link, facebook link), an abstract, a map and an image. If you search for your preferred entity and some information that you expect is not displayed, then it is missing in Wikidata. While it is very easy to see in a Question Answering User Interface that information is missing, I found it quite difficult in Wikidata itself.
Another example. Imagine you ask the question: “Who are the actors in the Lord of the Rings?”. Maybe you know many of the actors but checking if they are all indicated as actors of the Lord of the Rings is not easy. By just having the list of actors, you will find the missing ones much faster. If one of the actors you had in mind is missing, you can directly go to Wikidata and edit the corresponding entry.
The second case is about writing SPARQL queries. If you never wrote a SPARQL query they may look scary. If you know SPARQL, you may still find it time-consuming to write such a query. In both cases you can try to ask our Question Answering System in natural language. We generate under the hood multiple SPARQL queries. They are available under the “Q” button. If you click on it, you can find the SPARQL queries that we generated, ranked according to confidence. So we generate for you a SPARQL query in very short time.
Try it out
Do you want to try it yourself? Go to www.wdaqua.eu/qa and try it out! On the homepage you find some example questions to get inspiration.
That’s it. If you have any feedback, let me know in the comment section below.
Remember that the system is not perfect, but I try to improve it over time. So “Keep asking!”