Question Answering Using a Pre-Trained Model in Hugging Face

Yuan An, PhD
4 min readOct 25, 2023

--

This is a series of short tutorials about using Hugging Face. The table of contents is here.

In this lesson, we will learn how to use a pre-trained model in Hugging Face for question answering.

Question Answering

A Question Answering (QA) system is a type of artificial intelligence application that is designed to answer questions posed by humans in a natural and coherent manner.

An instance of a QA system consists of the following three main components: (1) given documents as context, (2) a question, and (3) an answer. A QA system can answer the question by extracting an answer directly from the given documents (extractive QA) or synthesizing the information in the documents to generate an answer (generative QA). We can evaluate a QA system by comparing the predicted answer to the given answer. QA systems have a variety of applications, including customer service, education, healthcare, and as virtual assistants in smart devices.

ChatGPT can be used as a QA system. For example, if we provide a piece of text to ChatGPT and ask it to answer a question using the text, ChatGPT will answer the question correctly most of the time. It demonstrates that ChatGPT has remarkable abilities in natural language processing.

However, a study [1] has shown that ChatGPT is sometimes inferior to specialized open-source systems for question-answering tasks. In this course, we will learn how to use the open-source transformers in Hugging Face for question-answering applications. To begin with, we will first use a pre-trained model in Hugging Face directly on a question-answering dataset.

Install Transformers and Datasets from Hugging Face

! pip install -q transformers[torch] datasets

Load the SQuAD DataSet

A widely used dataset for question answering is the Stanford Question Answering Dataset (SQuAD). There are two main versions of this dataset:

  1. SQuAD 1.1: This version consists of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text from the corresponding reading passage.
  2. SQuAD 2.0: Building upon SQuAD 1.1, this version adds a set of unanswerable questions, challenging the model to determine when no answer is available based on the provided text.

We will use SQuAD 1.1 for extractive QA in this tutorial.

from datasets import load_dataset

# Load the dataset
squad = load_dataset("squad")

The SQuAD 1.1 has been split into train and validation datasets. Each dataset has 5 features: id, title, context, question, and answers.

Load the Pre-Trained BERT model

For question-answering tasks, a commonly used pre-trained model is BERT (Bidirectional Encoder Representations from Transformers).

Hugging Face provides an easy-to-use interface for loading this model along with its tokenizer. Let us load the BERT model pre-trained on the SQuAD dataset using the Hugging Face Transformers library:

from transformers import BertTokenizer, BertForQuestionAnswering

# Load the tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

Get an Instance from the SQuAD Dataset

We will use the instance with index 20 in the SQuAD train dataset as an example.

instance = squad['train'][20]
context = instance['context']
question = instance['question']

Find the given answer and its start position in the context:

given_answer = instance['answers']['text'][0]  # Assuming the first answer is the correct one
given_answer_start = instance['answers']['answer_start'][0]

Tokenize the Example Data

inputs = tokenizer(question, context, return_tensors='pt', max_length=512, truncation=True)

Apply the BERT Model to the Example Data

output = model(**inputs)

Get the Predicted Answer

start_idx = torch.argmax(output.start_logits)
end_idx = torch.argmax(output.end_logits)

predicted_answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(inputs['input_ids'][0][start_idx:end_idx + 1]))

Evaluate the Result of the Example Data

correct = (predicted_answer.lower() == given_answer.lower())
evaluation = 'Correct' if correct else f'Incorrect (Predicted: {predicted_answer}, Given: {given_answer})'

print(evaluation)

The result is correct . Great! Now, let us apply the model to a set of instances from the train dataset.

Define a Function for a Single Instance

def evaluate_instance(instance):
context = instance['context']
question = instance['question']
given_answer = instance['answers']['text'][0] # Assuming the first answer is the correct one

# Tokenize the data
inputs = tokenizer(question, context, return_tensors='pt', max_length=512, truncation=True)

# Apply the BERT model
with torch.no_grad(): # No need to calculate gradients
output = model(**inputs)

# Get the predicted answer
start_idx = torch.argmax(output.start_logits)
end_idx = torch.argmax(output.end_logits)
predicted_answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(inputs['input_ids'][0][start_idx:end_idx + 1]))

return predicted_answer.lower() == given_answer.lower()

Evaluate the BERT Model on a Set of Instances by Calling the Function

correct_count = 0
total_count = 100

for i in tqdm(range(total_count)):
correct_count += evaluate_instance(squad['train'][i])

Evaluate the Results on the Set of Instances

# Calculate and output the accuracy
accuracy = correct_count / total_count
print(f'Accuracy: {accuracy * 100:.2f}%')

We got an accuracy result of 68% on the first 100 instances in the SQuAD train dataset.

In the next lesson, we will fine-tune a pre-trained model in Hugging Face on the SQuAD dataset for question answering.

The colab notebook is available here:

--

--

Yuan An, PhD

Faculty member in the College of Computing and Informatics at Drexel University; Doing research in NLP, Machine Learning, Ontology, Knowledge Graph, Embeddings