Open Domain Question Answering Part-1 [BlenderBot 2.0]

In this ODQA tutorial, we use BlenderBot 2.0. We discuss basic idea behind it, local setup, useful tweaks, and the pros and cons of designing an open domain QA chatbot.

Jubin Jose
Aquila Network
4 min readDec 31, 2021

--

BlenderBot 1.0 vs 2.0

From an engineering standpoint, in my opinion, keeping an end-to-end language model (like GPT3) up to date with the latest knowledge is very inefficient. That’s why we need a modular design based on the cost of change. For an open domain chatbot, the strategy is to separate knowledge from large models (costly to train) to external databases (cheap to update). It will also make AI affordable to anyone by reusing pre-trained, large language models for generating text with all knowledge kept in their affordable infrastructure.

BlenderBot2 roughly separates its operations into three modules — Knowledgebase (search module), long-term memory, and text generator. This design proposed by BlenderBot2 can be a general architecture to any system that generates (hallucinates) content based on real-world knowledge.

BlenderBot2 Architecture level 0

KnowledgeBase

A knowledgeBase can be anything that we’re using. It can be an SQL database, a search engine, knowledge graph, a bucket of documents, etc. To make our knowledgeBase compatible with other parts of our system, we need to sandwich it within a query generator and a response encoder. Query generator takes in a state/context as input and generates appropriate query. Once the knowledgeBase generates an appropriate response to the given query, an encoder will convert it into a compressed format (ex. latent vector) for the next module to use.

BlenderBot2 uses a full-text search engine (like Bing.com) as the knowledgeBase as far as we’re concerned.

Long-term memory

Long-term memory is fast in-memory vector storage that supports searching through conversation history for previously collected knowledge (from either the knowledgeBase or the user).

This can be done in one of two ways:

  • Summarize the entire conversation into a single vector for a specific conversation session. Keep this updated across turns.
  • Summarize the entire conversation into a small paragraph for a specific conversation session. Keep this updated across turns. BlenderBot2 uses this method.
  • Convert each turn of a conversation into corresponding latent vectors to be k-NN searched later.

Generator

Any general-purpose seq-to-seq NLP model can function as this module. We typically use large pre-trained end-to-end models such as GPT (WebGPT), T5, etc. These models are trained on massive datasets like CommonCrawl in an unsupervised fashion. Later on, these models get fine-tuned on conversational datasets.

In BlenderBot2, the generator takes in both text from search results and conversation summary to generate responses to the user. Optionally, there is a classifier (switcher) in between the generator and the other two modules to determine which input to use; we’re not interested in that for now.

Let’s get started

Step 1: clone this repository: https://github.com/freakeinstein/ParlAI_SearchEngine

git clone https://github.com/freakeinstein/ParlAI_SearchEngine.git

Step 2: create virtual environment & activate:

python3 -m venv env
source env/bin/activate

Step 3: install prerequisites:

pip install -r requirements.txt

Step 4: run search server:

By default, Google search is used, and is relatively slow. See “Useful tweaks” section below for more search engine options.

python search_server.py serve --host 0.0.0.0:8080

Step 5: run parlAI BlenderBot2 in a separate terminal (don’t forget to activate environment):

  • To use 3B parameters model (your laptop wouldn’t run it):
python -m parlai interactive --model-file zoo:blenderbot2/blenderbot2_3B/model --search_server 0.0.0.0:8080
  • To use 400M parameters model (a high-end laptop will run it):
python -m parlai interactive --model-file zoo:blenderbot2/blenderbot2_400M/model --search_server 0.0.0.0:8080

That’s it, you can start chatting with BlenderBot2 in your terminal.

Useful tweaks

parlAI command

Here are some great resources on command-line arguments to be used to tweak BlenderBot2’s behaviors: https://github.com/facebookresearch/ParlAI/tree/main/projects/blenderbot2/agents

Search Server

  • Search server uses “Google” as its default engine. It is not recommended for extended usage.
  • If you’re interested in reproducing results from the original paper (from FacebookResearch), use “Bing” instead.
  • If you’re trying to incorporate Blenderbot into your applications, use “Aquila” custom search engine. Which will restrict BlenderBot’s knowledge to custom web pages that you bookmark.

For example, talk to a Bitcoiner’s bookmark maintained at Aquila Network:

python search_server.py serve --host 0.0.0.0:8080  --search_engine="Aquila" --subscription_key "CJJ9ZGQEcK1Jffs6Ji2cTpVW4oiYP6X3VCP9YH4KhC"

Analysis & final thoughts

  • BlenderBot2 is still at research quality. To use it in production, you might need to consider finetuning it to your custom training data since the default responses are yet to beat our basic standards. You should also consider optimizing your reference documents (a custom search engine is recommended) to get better results.
  • BlenderBot2 is less hallucinating than its previous versions. It is also better than GPT-3 according to FacebookResearch in producing factual responses. OpenAI published WebGPT as a response to this and we haven’t tried it out by ourselves (maybe someday).
  • If you’re building a product that’s fact-critical, we don’t recommend using BlenderBot2 (and we recommend you to keep an eye out for Blenderbot3). In this case, we recommend you another more flexible method (link coming soon).

Digging down

wanna go down the rabbit hole? Dig here..

  • BlenderBot2 architecture:
BlenderBot2 Architecture level 1

--

--