Finding concise answers to questions in enterprise documents

J William Murdock
IBM Data Science in Practice
9 min readDec 20, 2020

Authors: J. William Murdock, Avi Sil, Anastas Stoyanovsky, Christophe Guittet

Photo from Mick Haupt at Unsplash: A compass, used here to symbolize the task of finding

Many business applications use some sort of search to find documents or passages. Some may also want to find answers within those documents or passages. In this article, we provide some examples of this task. We explain its relationship to cutting-edge research technologies. We then describe the answer finding capability in IBM Watson Discovery. We discuss ways of using that capability in a business application. We talk about limitations of the technology and plans to address those limitations. Finally, we discuss the availability of the capability and ask for feedback.

The Answer Finding Task

Consider the following question:

What versions of Firefox does InfoSphere Information Server 1.3 support?

An IBM support page answers this question by saying:

If you open the InfoSphere Information Server Web Console with Internet Explorer 11, you may get the error message: IBM InfoSphere Information Server supports Mozilla Firefox (ESR 17 and 24) and Microsoft Internet Explorer (version 9.0 and 10.0) browsers.

For some applications, finding this document might be enough to be useful. For many though, it would be better to also emphasize or highlight the exact answer: “ESR 17 and 24”.

The example above is an explicit question, but it could also be phrased as in implicit question:

InfoSphere Information Server 1.3 Firefox versions

This query is not grammatically a question, but it does have roughly the same meaning. So we would expect the same answer to this one too.

The answer in these examples (“ESR 17 and 24”) is a literal string within the text. This is a defining characteristic of answer finding: it finds answers in the text itself. It does not create new answers by drawing inferences. For example, if you asked an answer finding system “What is 4 plus 7?” it would not be able to give you a correct answer unless that system had some text that explicitly said that 4 plus 7 equals 11. The text would not need to use those exact words, but it would need to say something to that effect.

Research in Answer Finding

Photo by Michael Longmire from Unsplash: A microscope, used here to symbolize research

Finding answers in a collection of text requires finding relevant chunks of text and finding answers within those chunks. These two subtasks are often addressed using separate technologies. Finding relevant text is generally referred to as search. Search is often done using Information Retrieval capabilities such as Apache Lucene. Information Retrieval typically involves counting how many times each word in the query matches the target text. Information Retrieval gives more weight to terms that appear infrequently in the collection of documents. The combination of how many terms matched and how infrequent each of the matching terms were is used to rank search results.

Finding an answer within a single chunk of text (e.g. a paragraph) is sometimes referred to as “machine reading comprehension”. The task resembles the reading comprehension task that is common in standardized testing of children. The system gets a passage and a question and answers the question. With that said, most “reading comprehension” tests for children involve inferring an answer, not just finding an answer. So “machine reading comprehension” is an imperfect label for a component that finds an answer in a passage. However, it is commonly used to describe technology of this sort in a variety of scientific publications.

Popular data sets for testing such systems include Google Natural Questions data set (for English) and the TyDi data set (for 10 other typologically different languages including Bengali, Russian, Finnish, etc. and even low resource languages e.g. Swahili). You can learn more about the data sets and see which research systems are effective at those links. The leaderboards on the corresponding pages show which systems are doing the best on the data at a given time. GAAMA (acronym for Go Ahead Ask Me Anything) from IBM Research, which is trained on top of a large multilingual language model called XLM-RoBERTa, is generally at or near the top of the ranking for finding short answers.

Answer Finding in IBM Watson Discovery

The answer finding capability in IBM Watson Discovery starts with search. It uses Information Retrieval technology to find documents and passages. Next Watson Discovery calls its GAAMA model. Using that model, it extracts answers from the passages. Finally, it returns the documents, passages, and answers.

You can see all the details of how to use the answer finding feature for IBM Watson Discovery v2 in this IBM Community blog post and the API documentation. Below, we provide just a brief example and introduction.

API Example

Here we provide an illustrative example. Consider the query:

{“natural_language_query”: “InfoSphere Information Server 1.3 Firefox versions”, 
“passages”: {
“enabled”: true,
“max_per_document”: 3,
“characters”: 850,
“fields”: [“title”, “content”],
“find_answers”: true,
“max_answers_per_passage”: 1}}

For this query, Watson Discovery first searches for documents related to “InfoSphere Information Server 1.3 Firefox versions”. Because passages.enabled is true, it then tries to find passages for each document. Because passages.max_per_document is 3, it finds at most 3 passage for each document. Because passages.characters is 850, the passages are roughly 850 characters long. The passages come from fields named “title” or “content” based on the passages.fields parameter. Because passages.find_answers is true, Watson Discovery then tries to find answers in the passages (using its GAAMA model). Because passages.max_answers_per_passage is 1, it finds at most 1 answer in each passage.

For this sample query, Watson Discovery returns a list of documents. Within each document, there is a list passages. Within each passage, there is a list of answers. Here is an example of a passage within a document within this search result:

{“passage_text”: “<em>InfoSphere</em> <em>Information</em> <em>Server</em> Web Console with Internet Explorer 11, you may get the error message: IBM <em>InfoSphere</em> <em>Information</em> <em>Server</em> supports Mozilla <em>Firefox</em> (ESR 17 and 24) and Microsoft Internet Explorer (<em>version</em> 9.0 and 10.0) browsers.”, 
“start_offset”: 287,
“end_offset”: 526,
“field”: “content”,
“answers”: [{
“answer_text”: “(ESR 17 and 24)”,
“start_offset”: 446,
“end_offset”: 700,
“confidence”: 0.6925222}]}

This passage object starts with the text of the passage. In the text, keyword matches to the query are emphasized. Next is the start and end offsets of the passage. After that is the field that the passage came from (which is “content” in this example). There is a list of answers, with one answer (because we set passages.max_answers_per_passage to 1 in the query). Each answer has text, offsets, and a confidence value. The confidence ranges from 0 to 1 and is an estimate of the probability that the answer is correct.

Business Applications

Photo by Sebastien Gabriel from Unsplash: Office buildings in San Francisco, symbolizing business

We envision two major classes of applications for this technology: search and document review. In a search application, a user provides a query, and that query that is used to find documents and find answers within the documents. In a document review application, an end-user first finds a document that they want to review. That user then wants to get information from that document. For example, a user might first find a contract and then ask a question about that contract. To use answer finding for document review in Watson Discovery, add a filter to your query to restrict answers to the selected document.

In general, we expect answer finding accuracy to be higher and for document review use cases. For example, consider a search query like “What is the deadline for product delivery in the 2021 wholesale contract with SampleCo?” Within this query are two separate tasks: (1) finding the 2021 wholesale contract with SampleCo and (2) finding the deadline for product delivery within that contract. It is much easier for Watson Discovery if you separate these out into two steps. The user would first find the 2021 wholesale contract with SampleCo (perhaps by searching on the query “2021 wholesale contract with SampleCo” or perhaps by scrolling through a list of contracts). The user would then ask questions about this contract such as “What is the deadline for product delivery?”. Either approach can work and get correct answers. The document review approach is likely to get correct answers more often. However, the search approach may present a better, more convenient user experience. We recommend considering both options depending on the needs of your end users.

Once you have found answers in your application, you need to show them to users. Typically answers make more sense within the context of the passage in which they were found. So for most applications, we recommend emphasizing the answer within the passage instead of showing the answer alone. In some case, it can be useful to present the answer first in a larger font and then show the passage with the answer in it. If your application has extremely limited room to show answers (e.g., a smart-watch app), then showing the answer text along might be make sense. However, in that case, we would recommend that you only show such answers when Watson Discovery has extremely high confidence. Providing a wrong answer without any context is often a bad and confusing user experience. Any answer finding system will get some answers wrong.

Even if you are showing the full context, we recommend discarding low confidence answers. When answer finding is enabled, Watson Discovery will almost always return answers in every passage. Often these answers have very low confidence and are almost certainly wrong. The decision of how much confidence you should have before showing an answer is very subjective and depends a lot on the details of an application. For that reason, Watson Discovery does not discard low confidence answers internally. It relies on the calling application to decide whether the confidence is high enough for some use. For more information about how to select a confidence threshold, see this article.

Limitations and Future Work

Here are some key limitations of the answer finding capability in Watson Discovery:

  • As noted earlier, the answer finding capability in Watson Discovery does not create answers, it only finds them. So if the answer is not stated in the text, it cannot find that answer.
  • Answer finding is only useful for answering yes/no questions if there is some text to find that answers the question. For example, consider the question, “Should I add more insulation to my attic before winter?” If you have a document that says you should add more insulation before winter, answer finding can be useful for finding that statement. However, if you have a document that lists a set of advantages and disadvantages of doing so and does not draw a conclusion, answer finding will not help. Watson Discovery document and passage search may still be useful for finding that text, but there is not an answer in the text to find.
  • The capability is generally not useful for queries that are not explicitly or implicitly asking a question. For example, a query like “IBM support” is a navigational query, where a user is trying to find a specific page, not get an answer.
  • The answer finding capability is useful for questions where the answer is a single noun (e.g., “Who won Super Bowl XII?” — “Dallas Cowboys”) or a complex phrase or clause (e.g., “Why did the Cowboys win Super Bowl XII?” — “because their defense forced many turn-overs and obliterated the Broncos passing game”). However, more complex questions that require passage or document length answers are not a good fit for answer finding. So if your users only ask questions that need very long answers, answer finding may not be useful. If your users ask a variety of questions and a significant fraction of them do require shorter answers, then answer finding is more likely to help. In those cases, you can often rely on the answer confidences to know which answers to show and which to ignore.
  • Watson Discovery’s answer finding is not particularly good at finding answers within tables or complex lists. For example, if I ask “IBM’s 2014 gross revenue”, answer finding is likely to get the right answer if there is a sentence stating IBM’s gross revenue for the year. It is much less likely to get the right answer from a table showing a variety of financial results for a variety of years.

Some of these limitations may be addressed in the relatively near future. For example, Watson Discovery already has technology for analyzing tables and finding tables that are relevant to a query. So we may be able to get improved answer finding from tables soon, especially if there is a lot of customer demand.

Availability of answer finding in Watson Discovery

Answer finding is now generally available for in IBM Watson Discovery v2 on the IBM public cloud, which is available in the Plus and Premium pricing plans. If you have an Watson Discovery v2 instance on the IBM public cloud, we encourage you to try it out. We will also make it available soon for private cloud and non-IBM clouds via IBM CloudPak for Data.

If you have feedback regarding this feature, please let us know by posting in the Answer Finding discussion thread within the the IBM Watson Discovery Community.

--

--

J William Murdock
IBM Data Science in Practice

I am a computer scientist in the Watson Research Center at IBM. I work on IBM Watson cloud computing AI services. http://bill.murdocks.org/