Building a Question Answering System

Published in

The Startup

4 min readSep 4, 2020

Over three months we had the chance to design and implement a question answering project with Serviceware SE.
Question Answering is a common task in natural language processing („NLP“) — a subfield of machine learning — in which the system processes a question and retrieves an answer for the user.

The idea is that the user does not need to search for his answer in a long text or catalogue of articles („dataset“), as he would with a normal search engine, but rather receives a preferably short answer to his question. This is a great opportunity for customers as well as companies: Customers safe time not contacting call centres by servicing themselves and companies can focus their efforts on more sophisticated requests.

This field of NLP is subject to extensive research and over the past few years, massive progress has been made. Recent models from Google — like BERT — exceed human-level precision in answering questions, when trained properly.
But still: It is far from being solved. For example, a shortcoming of Google’s solution is, that it only works on single paragraphs. Only if a paragraph is known in which the answer likely is, the model can answer the question.

To extend this approach we built a question answering system, which can not only answer a question on a single paragraph, but on a whole dataset. Because this should only work on a specific domain, for example a documentation database, it is called a closed domain question answering system.

How does it work

But how does it work? We divided our application into three parts:

First we compare our question to each and every document in the used dataset and determine the 20 most likely documents to contain the answer.
Secondly we put all the paragraphs of these 20 documents together and, again, comparing those with the question and determine 20 paragraphs to likely contain the answer.

Lastly we take BERT (the model by google) and try to extract an answer for every paragraph and select the answer in which the confidence score is highest — the answer where the model is most certain to have found the right answer.
The first two steps are called retrieval and the last one reading.

For comparing documents or paragraphs to our question we build vectors on our question as well as every paragraph and document. Then we put every vector in a large vector room and determine the similarity between the vectors and the question vector by calculating the angle between those vectors via the cosine similarity.

But why does this work?

Because these vectors and their direction represent different features: They can indicate which words or which meaning/content a given text has. For example one direction can mean baseball and another natural language processing. „Machine learning“, for example, would definitely be more similar to „NLP“ than „baseball“.

We choose those documents/paragraphs for the next step which have the highest similarity.

For the reading part we took a pre-trained multi-lingual model by google and trained it to answer questions on the english wikipedia. As previous studies have shown the cross learning effects to other languages are significant. So we can answer questions in multiple languages.

Conclusion

As a result we can answer questions on a given database in multiple languages. The results were quite promising. It works more accurate and is even faster than existing solutions (like cdQA by PNB Paribas).
But while delivering promising results, our solution faces some challenges. In a customer facing situation one had to reduce the number of false positives to (almost) zero, because giving a false answer to a customer could have economical importance.

To accomplish this, there are different approaches. For example:
- Intern beta testing, to see which answers get asked in support,
- pre screening, to let the customer see only the right answers,
- using the model only monolingual,
- and of course training on questions specific to the given datasets.

It is an exciting development and promising path, that we, thanks to Serviceware and TU Darmstadt, were able to explore. We were astonished how far we could get in merely three months. Probably because of the good support of Serviceware and their amazing team.

A special thanks to Niklas and Adrian, as well as Luisa.

And of course to Ji Yune Whang, Joshua Bodemann, Marcel Nawrath, Sebastian Marcus Meier and Wladislav Miretski — we really have been a great team.

Building a Question Answering System

How does it work

But why does this work?

Conclusion

Written by Tristan Ratz