Thoughts and Theory

Hierarchical Classification of Expected Answer Type in Knowledge Graph Question Answering

How does the machine understand what the user is asking?

Aleksandr Perevalov
TDS Archive

--

TLDR

One of the important steps people use when searching for an answer to a question is to understand what type of answer would suit the best [1]. For example, to the question, “What time is it?” we expect to hear an answer of the “time” type, and to the question, “Where was Ivan Petrov born?” — the answer with the type “city” or “country”.

The same is true for Question Answering (QA) systems based on Knowledge Graphs, which aim to find answers to factoid questions. This article presents a module for determining the Expected Answer Type (EAT), which is capable of predicting not only a single class, but also building a hierarchy of classes as a predictive value. The module is provided both as a web interface (UI) and as a RESTful API. This functionality allows end users to get answer type predictions for 104 languages, see the confidence of the prediction and leave feedback. In addition, the API allows researchers and developers to integrate EAT classification module into their systems.

Understanding what a person is asking via a question is one of the first steps that humans use to find the corresponding answer.

The Web UI of the expected answer type classifier (Image by Author)

Knowledge Graph Question Answering Systems

There are two paradigms for developing question answering systems: (1) based on unstructured data (IR-based), whose goal is to find the most relevant paragraph in a set of text documents, and (2) based on structured data and knowledge (KBQA), such systems translate a natural language question into a formalized query (SQL, SPARQL, etc.) [2]. Separately, we should mention the knowledge graph question answering systems (KGQA), which are a subset of KBQA and have recently become more and more popular.

Paradigms for developing Question Answering systems (Image by Author)

As the name implies, KGQA systems are powered by knowledge graphs, often stored using the Resource Description Framework (RDF), which in turn allows access to data via SPARQL queries. In other words, the goal of a KGQA system is to convert a natural language question into a SPARQL query, in order to simplify data access for the end user.

The questions posed to the KGQA systems are fact-based. For example, when we ask the question “In which city was Angela Merkel born?” we expect to see an answer of the type “city”, in this example Hamburg. In this case, “city” is the Expected Answer Type. Such types are often organized into hierarchical taxonomies or ontologies (e.g., DBpedia Ontology), depending on the particular knowledge graph used in the KGQA system. Considering the question “In which city was Angela Merkel born?” the expected answer type hierarchy (based on classes from DBpedia Ontology) would look as follows.

Expected Answer Type hierarchy for the question: “In which city was Angela Merkel born?” given the DBpedia Ontology (Image by Author)

In this hierarchy, the first type is the most specific, while the last is the most general.

Why do KGQA systems need to know the expected answer type? It is very simple — it reduces the search space for answers by several times. This can be shown with a simple example (see code snippets below) using the familiar Angela Merkel question.

# without EAT prediction
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT (COUNT(DISTINCT ?obj) as ?count)
WHERE {
dbr:Angela_Merkel ?p ?obj .
}
# ?count = 861

As seen in the code snippet, this SPARQL query counts possible answer candidates given the Angela Merkel’s resource in DBpedia. The result is huge — 861. Let’s try to narrow down the search space with the predicted EAT.

# with EAT prediction
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT (COUNT(DISTINCT ?obj) as ?count)
WHERE {
dbr:Angela_Merkel ?p ?obj .
?obj rdf:type ?type .
FILTER(?type = dbo:City)
}
# ?count = 6

Now, as we restricted the set of answer candidates with the type “City”, there are only 6 possible candidates. That is really impressive, as it’s much more easier to find the correct answer among 6 candidates rather than 861. In the next section, the particular EAT classifier architecture will be presented.

Expected Answer Type Classifier’s Architecture

There are three approaches to hierarchical classification [3]: flat, local and global. The flat approach ignores the hierarchy, one might even say flattens it, and in this case we are dealing with multi-label classification. The local approach uses several classifiers for each level (node) of the hierarchy, whereas the global approach predicts the whole hierarchy within one call.

In this article we use the local approach (see Figure with the architecture) to hierarchical classification of EAT. The solution is based on multilingual BERT models [4], to which a fully-connected layer of n-neurons is appended to the [CLS] token output, where n is the number of classes to predict at a particular hierarchical level (node).

Architecture of the Hierarchical EAT classifier (Image by Author)

The figure shows 3 models — category classifier, literal classifier and resource classifier. Altogether there are three category classes: boolean, literal and resource. There are also three literals: number, data and string. Things are more complicated with resource, because there is a full hierarchical classification (see the example in the introduction). In the case of our solution, the resource classifier predicts the most specific response type (e.g. dbo:City), then we simply fetch the rest of the hierarchy from DBpedia to the top level using SPARQL query as it is shown below.

Source code for fetching DBpedia Ontology hierarchy given the most specific ontology type

The following code was used to create the BERT-based EAT classifiers. The complete source code can be found in our Github repository.

Source code for creating a multiclass classifier with transformers library

The example of the classifier’s output is presented below.

[
{
"id": "dbpedia_1",
"question": "Who are the gymnasts coached by Amanda Reddin?",
"category": "resource",
"type": ["dbo:Gymnast", "dbo:Athlete", "dbo:Person", "dbo:Agent"]
},
{
"id": "dbpedia_2",
"question": "How many superpowers does wonder woman have?",
"category": "literal",
"type": ["number"]
}
{
"id": "dbpedia_3",
"question": "When did Margaret Mead marry Gregory Bateson?",
"category": "literal",
"type": ["date"]
},
{
"id": "dbpedia_4",
"question": "Is Azerbaijan a member of European Go Federation?",
"category": "boolean",
"type": ["boolean"]
}
]

The quality of the category classifier was measured by the Accuracy metric, while the other classifiers were evaluated using the NDCG@5 and NDCG@10 metrics, which are designed to evaluate ranked lists. After running the evaluation script, we obtained the following results: Accuracy: 98%, NDCG@5: 76%, NDCG@10: 73%. These results can also be found on the public leaderboard of the SeMantic AnsweR Type prediction task 2020 at: https://smart-task.github.io/2020.

Conclusion

This short article presented a component for classifying expected answer type, which can be used in question answering systems based on knowledge graphs. The classifier supports multilingual input and performs quite well in terms of prediction quality. The important links are here:

References

  1. Hao, Tianyong, et al. “Leveraging question target word features through semantic relation expansion for answer type classification.” Knowledge-Based Systems 133 (2017): 43–52.
  2. Jurafsky, Daniel, and James H. Martin. “Speech and language processing (draft).” Available from: https://web. stanford. edu/~jurafsky/slp3 (2021).
  3. Silla, Carlos N., and Alex A. Freitas. “A survey of hierarchical classification across different application domains.” Data Mining and Knowledge Discovery 22.1 (2011): 31–72.
  4. Devlin, Jacob, et al. “Bert: Pre-training of deep bidirectional transformers for language understanding.” arXiv preprint arXiv:1810.04805 (2018).

Acknowledgments

I would like to thank my supervisor, Prof. Dr. Andreas Both, who gave me the chance to work on my PhD dissertation. I would also like to thank the Anhalt University of Applied Sciences for the support. Finally, I would like to thank Prof. Dr. Axel-Cyrille Ngonga Ngomo, who agreed to co-supervise my PhD dissertation.

--

--

No responses yet