How Large Language Models and Retrieval Augmented Generation Answer Question over a Document — An Overview

Published in

BetaFlow

5 min readAug 3, 2023

source : https://knowyourmeme.com/memes/peter-parker-reading-a-book

What is the Problem?

If you have used ChatGPT ( and I am sure you did!) yo have noticed the the process is quite easy -> you ask an open-domain question in “free format text” and you get an answer :

Figure 1 : General Question about Tesla to ChatGPT-3.5

However, for more domain-specific question, ChatGPT can become less “confident” :

Figure 2 : Domain Specific Question to ChatGPT-3.5 about Tesla

This reveals one SERIOUS issue : Model Trustworthiness.

Despite that I am sure Open-AI use high quality research and documents to build ChatGPT , however in some-applications , users want answers generated from trustworthy documents (Like proper Technical, User or Financial document from reputable sources) . Here comes Question-Answering over Documents (Docs-QA) to solve the problem.

So How Docs-QA solve AI Trustworthiness? It partially solves it by making sure that only “Trustworthy” or “Relevant” references are used to generate answers. Furthermore, with a proper “citation” mechanism it can help user find pieces of the document used to generate such an answer.

The Techniques we are going to explain in this and following articles are all applied in AnswerMe API for Docs-QA . It’s free ,and you can start using it by creating Rapid account.
Here is an example of answering the same question in Figure -2 using AnswerMe implementation to Docs-QA

{
   "question":"What is the alert APP_w207",
   "answer":"The alert APP_w207 indicates that the Autosteer 
              feature on your vehicle is temporarily unavailable. 
              There are several possible reasons for this. It could be a 
              temporary condition caused by external factors such as \
              missing or faded lane markers, narrow or winding roads, 
              poor visibility due to weather conditions like rain, 
              snow, fog, extremely hot or cold temperatures, 
              or bright light from other vehicle headlights or direct sunlight.
              \n\nThis alert may also appear if you exceeded the maximum speed
               limit for Autosteer while it was active. 
               In this case, Autosteer will not be available for 
              the remainder of your current drive.\n\nTo address this 
              alert, you should continue driving to your destination. 
              If Autosteer is still unavailable when you reach your 
              destination and remains unavailable during your next 
              planned drive, you can check for potential obstructions or 
              damage. 
              This includes looking for mud, ice, snow, or other 
              environmental 
              factors obstructing the sensors, checking for obstructions 
              caused by objects mounted on the vehicle like a bike rack, 
              inspecting for any obstructions caused by paint or 
              adhesive products like wraps, stickers, or rubber 
              coatings, and examining the bumper for any damage or 
              misalignment.\n\nIf there are no obvious obstructions or damage, you 
              can continue driving your vehicle as it is safe to do so. 
              However, if you do find any obstructions or damage, it is 
              recommended to schedule a service appointment at your 
              convenience.\n\nFor more information and troubleshooting tips, 
              you can refer to the Autosteer section on page 100 
              of your vehicle's manual."
}

The answer in this snippet is generated based on Tesla Model S Owner manual.

In the following part of the article , we give a quick glimpse about how Docs-QA works.

Figure 3 : Docs-QA System Components

Phase 1 — Parsing the Document

This phase is about transforming the document from raw format to an understandable format.

Task 1.1 — Document Loading

The document can be of different formats, for ex. text,PDFs, text-in-images or databases entries ( SQL or No-SQL databases). For each, we need the relevant document loader ( for ex. in python it can be normal text-loading or using PyPDF package for reading PDF files as text).

Task 1.2 — Document Splitting

After loading the document into memory, the load-data has to be “chopped in chunks” to be processed later. This chopping is called “splitting” and can be based on several “usually simple” criteria. For example , Langchain RecursiveTextSplitter recursively split text based on splitting characters till the split-chunks are small enough. Another way is to use a well-founded NLP package, like NLTK, which has an embedded sentence splitter. More on text splitting in our following articles in this series.

Phase 2 — Text Understanding

If you think about the process “understanding” any topic or text, you can break it down to tasks : Storing it or “embedding” it , with some embedding-model into your brain. The second task is to “retrieve” this information from your mind. Same steps applies in Docs-QA understanding.

Task 2.1 — Text Embedding

This task is about one thing :

convert text from character format to numerical-features format

The simplest embedding model is the Bag-of-Words. Simply, let’s take the following sentence as an example :

John lives in Boston

After removing stop word “in” and stemming the sentence, it has the format

John , live, Boston

Then we can consider each word as “feature” whose numeric feature is 1. This can be

{"John":1,"live":1,"Boston":1}

There are numerous, more sophisticated Embedding Models that we will detail in later articles in this series.

Phase 3 — Answer Generation

In summary, this phase is about retrieving “chunks” of text relevant to the query and generating an answer in the context of these “chunks”.

Task 3.1 — Question-Document based Retrieval

By applying embedding to the Question and chunk of text we get a feature vector representation for the query or question and small-documents or chunks of text. Then by applying similarity search methods, like cosine similarity or K-NN, we retrieve K chunks that are the most similar to the query ( or question) .

Task 3.2 — Answer Generation

The main engine behind this task is Language Models ( or the more powerful new Large Language Models , or LLM) . It works as a Generative Model that , with some magic , calculates the conditional probability of sequence of words, most probably to answer the question conditioned on the set of chunks retrieved in step 3.1 . More on that on the following set of articles in this series.

Enjoyed This Story?

Subscribe for free to get notified when We publish a new story in our publication.

Also follow us on twitter for similar articles and new about Machine Leaning , Artificial Intelligence and Cloud-Based solutions.

References

Retrieval Augmented Generation: Streamlining the creation of intelligent natural language processing models September 28, 2020-(Blog-Post)
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020 (Paper)
Langchain — TextSplitters
List of Embedding Models
QA over Documents

How Large Language Models and Retrieval Augmented Generation Answer Question over a Document — An Overview

Enjoyed This Story?

References

Written by M. Baddar