Developments, challenges and trends in question answering systems

Question answering systems[1] are computer-based systems that aim to provide information that more exactly meets user needs than the lists of often irrelevant results given by search engines. This is done through natural language processing. Question answering with knowledge base[2] aims to answer natural language questions using real-world facts stored in an existing, large-scale database.

An example of a question answering system is START, which was developed by researchers from the MIT Computer Science and Artificial Intelligence Laboratory.

A recent paper[3] presents the latest research in regard to question answering systems, identified through a review of conferences and journals on information retrieval, knowledge management, artificial intelligence, web intelligence, natural language processing, and the semantic web. Six research questions were asked:

RQ1: Are researchers gaining or losing interest in question answering systems?
RQ2: What are the characteristics of question answering systems being given the most attention?
RQ3: What are the topics of the research being given the most attention?
RQ4: What are the challenges faced by researchers in this area?
RQ5: What kinds of solutions are being proposed?
RQ6: What are the trends of research in this area?

Findings

RQ1: Are researchers gaining or losing interest in question answering systems?

There is a growing trend in publications indicating an increased interest in this area from the research community.

RQ2: What are the characteristics of question answering systems being given the most attention?

This question can be answered from three different points of view: domain type, question type, and system type.

From the domain type point of view, closed domain question answering systems accept questions only from a specific domain while open domain question answering systems do not have this limitation. The question answering systems that are most popular and are being given more attention to are open domain question answering systems. This is justified by the need of modern systems to be extensive and inclusive of all areas of information and knowledge.

From the question type point of view, the question answering systems that are most popular and are being given more attention are factoid question answering. Factoid questions are questions that can be answered with simple facts expressed in short text answers. However, the researchers noticed a growing number of contributions, especially in 2016, on non-factoid question answering systems. This fact suggests a growing interest in the research community for this kind of question answering systems and a possible trend towards systems that are more intelligent and closer to humans.

From the system type point of view, original question answering systems were closed encyclopedic-like systems with the system relying on its own knowledge for answering questions. Some of the modern question answering systems like Quora are community-based where the users rely on expertise from the community to get an answer for their question. The question answering systems that are most popular and are being given more attention, are non-community question answering systems with the most number of contributions. However, the researchers noticed that a great amount of research is being done on community question answering systems and the difference in publications for the two systems is not very big. This reflects the increasing role that social networking and online communities have in the acquisition of knowledge.

RQ3: What are the topics of the research being given the most attention?

Most of the research is being done on issues regarding question processing. This is justified by the need to understand user questions better in order to provide a more accurate answer. It is worth mentioning that a considerate amount of research is being done on issues involving all the answering process from information source organization to question analysis and answer generation.

RQ4: What are the challenges faced by researchers in this area?

The most prominent challenge is the lexical gap. It is evident in the difference between questions expressed in natural language and the semantically structured information of the knowledge base. The lexical gap is also present in community question answering systems as the difference between user questions asking for the same thing using different words, as well as between answer and question which can, sometimes, differ considerably from a lexical point of view. Another prominent challenge for knowledge base question answering systems was the question entity identification, especially in questions involving multiple entities. The lexical gap can have a negative effect on this problem and increase the difficulty of entity identification.

RQ5: What kinds of solutions are being proposed?

The solutions being applied to solve various issues of the answering process are natural language processing and machine learning methods implemented with neural networks, algebraic and probabilistic models with the latter having the most number of contributions.

RQ6: What are the trends of research in this area?

The researchers noticed a growing number of contributions on multiple knowledge base question answering systems, with 75% of them during the year 2016. This is indicative of increased research interest in this type of systems and a future research trend justified by the need to create more flexible systems that obtain and validate answers from multiple and possibly external sources in cases when a single knowledge base is not enough to answer the question. The researchers also noticed a constant increase in the amount of contributions on non-factoid question answering systems. They identify this as an increased research interest and future research trend towards systems that are more intelligent and closer to humans.

The researchers also found a new type of information source for question answering systems that is being researched during the last year. This is image-based information retrieval, where the information source for finding the answer is either entirely composed of an image database or is a text and image hybrid. They identify this as a research trend motivated by the need to create question answering systems that go beyond the traditional boundaries of text based systems towards a more complete artificial intelligence.

References:

  1. Jurafsky, D., & Martin, J.H. (2017). Speech and Language Processing. Draft of August 7, 2017.
  2. Yih, W. T., & Ma, H. (2016, July). Question Answering with Knowledge Base, Web and Beyond. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval (pp. 1219–1221). ACM.
  3. Kodra, L., & Meçe, E. K. (2017). Question Answering Systems: A Review on Present Developments, Challenges and Trends. International Journal of Advanced Computer Science and Applications, 8(9).

Originally published at RealKM.