A Semantic Parsing-based Relation Linking approach for Knowledge Base Question Answering

Salim Roukos
3 min readDec 3, 2020

--

Nandana Mihindukulasooriya and Ibrahim Abdelaziz

Knowledge base question answering (KBQA) has emerged as an important Natural Language Processing task for many real-world applications. Relationship Extraction and L­inking (REL) is a crucial sub-task of KBQA that involves identifying the relations in natural language questions and linking them to their equivalent relations in the underlying knowledge base. These relations are then used to construct a query to retrieve the question answers. For example, given the natural language question “Who is starring in Spanish movies produced by Benicio del Toro?”, REL should identify 3 relations; “dbo:producer”, “dbo:starring”, and “dbo:country” when using DBpedia as the underlying KB. REL, however, faces several challenges such as 1) the large number of candidate relations in KBs such as DBpedia or Wikidata, 2) the extensive lexical gap between the surface relations in text and their KB equivalent. For example, the question text above does not explicitly mention any reference to the relationship “dbo:country”, and 3) the presence of multiple relations with both known and unknown entities as subjects and objects.

To address these challenges, we propose SLING (Semantic LINkinG), a distant supervision-based approach that leverages semantic parsing such as Abstract Meaning Representation (AMR) for relation extraction and linking. SLING is a generic framework integrating different approaches for REL based on statistical predicate alignment, word embedding and neural networks.

An overview of SLING with is shown in the figure below, (a) showing a process-oriented view while (b) illustrates with an example. The input to SLING is a question in natural language along with its corresponding AMR representation. The output is a ranked list of relations corresponding to every subject-object pair in the sentence. The input is processed by the components in Question Metadata Generation to extract AMR triples (subject-object pairs and their AMR predicates) and generate metadata corresponding to each of them. Each module in Relationship Linking produces a ranked list of KB relations with scores for a metadata-enriched AMR triple. These are aggregated to produce the required output.

SLING is different from previous approaches in several aspects. It is the first approach to harness AMR semantic parsing of text for REL in KBQA. Existing approaches have used only syntactic parses, such as dependency parses of the question. However, semantic parsing through the use of AMR brings several advantages such as (1) AMR detects named entities and maps them to predefined entity types (normalized) which forms the arguments of relations that have to be mapped to a KB, (2) AMR not only identifies relations in text but also normalizes them using PropBank frames; (3) It reduces the ambiguity of natural language by converting relation phrases to their corresponding sense and (4) for questions, a special node, amr-unknown, is used to represent a placeholder for the answer to the question. These characteristics of AMR help to alleviate the lexical gap by reducing different phrasings of relations to its predicate set. Furthermore, they also help to automatically determine the relationship structure of an input question and extract all relationships useful for forming a SPARQL query, hence addressing the challenge of extracting multiple relationships from questions text.

SLING combines both rule-based and deep learning-based modules to capture complementary signals such as linguistic cues, rich semantic representation, and information from the knowledge base. It uses a distant supervised technique to generate training data from a given knowledge base and a text corpus without requiring any task specific training data. This training data is used by multiple modules. First, they are used to generate mappings between text, AMR, and KB relations. Such mappings are used to generate alignments between PropBank predicates used in AMR and KB relations using alignment statistics. Similarly, the distant supervision data is also leveraged for training Deep Learning based relation classification models.

These novel methods achieved state-of-the-art performance on two KBQA datasets; Question Answering over Linked Data (QALD-7 and 9) and Large Scale Complex Question Answering Dataset (LC-QuAD 1.0). In particular, SLING resulted in an improvement of 5–24% in F1 score compared to existing approaches. For more details on SLING and other experimental evaluation, see [1].

[1] Mihindukulasooriya, N., Rossiello, G., Kapanipathi, P., Abdelaziz, I., Ravishankar, S., Yu, M., Gliozzo, A., Roukos, S. and Gray, A., 2020. Leveraging Semantic Parsing for Relation Linking over Knowledge Bases. International Semantic Web Conference (ISWC), 2020.

--

--

Salim Roukos

IBM Fellow, working on multilingual NLP using Machine (and Deep) Learning models for language translation, information extraction, and language understanding.