Building an Interactive Question Answering App with Streamlit, Transformers, and Langchain WikipediaAPIWrapper

Anoop Johny
7 min readJun 5, 2023

Introduction

In this post, we’ll look at how to use Streamlit, Transformers, and Langchain WikipediaAPIWrapper to create an interactive question-and-answer program. Users of the app can ask a question and get a response depending on the information they have provided. We’ll go over the code step by step and explain everything we do.

Wikipedia Wrapper API: Seamlessly Accessing Contextual Information!

Natural language processing (NLP) question answering is a difficult task, but recent developments in transformer-based models have greatly enhanced its performance. Hugging Face’s Transformers library offers pre-trained models and tools that make it simple to do question-answering activities. The widely used Python module Streamlit is used to create interactive online applications, while Langchain WikipediaAPIWrapper is a toolkit that facilitates retrieving Wikipedia context data based on keywords.

What’s Langchain

A toolkit called Langchain is intended to make language-related activities in Python applications easier. In order to improve language processing, analysis, and information retrieval, it provides a variety of features and tools. The goal of Langchain is to make working with natural language data less complicated by offering a high-level interface and simple-to-use functions.

Langchain: Empowering Language Processing with Efficiency and Ease

Using keywords or search queries, this wrapper enables developers to quickly retrieve context information from Wikipedia. With the use of Wikipedia’s large knowledge base and in-depth articles, Langchain enables applications to access pertinent and educational content to improve a variety of language-related tasks.

Accessing Wikipedia data is made simple with Langchain’s WikipediaAPIWrapper. The complexity of API requests, response parsing, and content retrieval are handled by it, freeing developers to concentrate on utilizing the data for their particular use cases. This wrapper enables applications to extract useful data from Wikipedia to improve information retrieval, text summarization, question answering systems, and more.

In addition to the WikipediaAPIWrapper, Langchain offers additional tools for jobs involving languages. It provides tools for text preprocessing and cleaning as well as entity and language detection, sentiment analysis, and other typical NLP tasks. These tools offer pre-built functions for typical language processing tasks, saving developers time and effort.

Overall, Langchain is a useful tool for developers working on projects involving languages. With the help of its WikipediaAPIWrapper and other language processing tools, developers may unleash the potential of natural language data and create programs that efficiently comprehend text and draw conclusions from it.

Setting up the Environment

To get started, we need to install the required dependencies. You can use pip to install the necessary packages:

pip install transformers streamlit langchain

Make sure you have a working Python environment with the required dependencies installed.

Importing Libraries

Let’s begin by importing the necessary libraries.

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
from langchain.utilities import WikipediaAPIWrapper
import streamlit as st

To load the pre-trained question-answering model and tokenizer, we import AutoModelForQuestionAnswering and AutoTokenizer from the Transformers library. In order to establish a pipeline for answering questions, we also import the pipeline function. For the purpose of creating the web application, we additionally import WikipediaAPIWrapper from Langchain utilities and streamlit.

What’s streamlit ?

Streamlit makes it easier to create interactive web apps. Streamlit enables developers to quickly and effectively design data-driven applications because to its user-friendly and intuitive interface.

The simplicity of Streamlit is one of its main advantages. Your data analysis or machine learning models can be turned into interactive dashboards or apps with just a few lines of code. By abstracting away the challenges of web development, Streamlit frees developers to concentrate on the essential features and user interface of their applications.

Streamlit’s large ecosystem is another noteworthy aspect. It provides a large variety of integrated widgets and components that let programmers create interactive features like sliders, buttons, and dropdown menus. Additionally, Streamlit is a flexible tool for data analysis and visualization because to its smooth integration with well-known data science libraries like Pandas, Matplotlib, and Plotly.

Additionally, Streamlit supports the sharing and deployment of applications. It provides a simple method for publishing and sharing your apps with others, whether through the cloud-based service from Streamlit or by distributing them to different platforms like Heroku, AWS, or Docker. This makes it simple for users to collaborate and use your applications from a distance.

Sample streamlit app which shows data analysis and whats possible

In summary, Streamlit is a game-changer in the world of web application development for data science and machine learning. With its simplicity, interactivity, and deployment capabilities, Streamlit empowers developers to create engaging and impactful applications that bring data to life and enable efficient data-driven decision-making.

Creating the Streamlit App

We start by creating the Streamlit app and setting the title:

st.title("Streamlit Question Answering App 🦜 🦚")

This line sets the title of our web application to “Streamlit Question Answering App” with bird emojis.

Loading the Model and Tokenizer

Next, we load the question answering model and tokenizer:

model_name = "deepset/roberta-base-squad2"
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

We specify the model name as "deepset/roberta-base-squad2", which is a pre-trained RoBERTa model fine-tuned on the SQuAD 2.0 dataset. We then create instances of AutoModelForQuestionAnswering and AutoTokenizer using the from_pretrained method.

Creating the Question Answering Pipeline

Once we have loaded the question answering model and tokenizer, we can create a question answering pipeline to handle the question answering task seamlessly.

In our code, we use the pipeline function from the Transformers library to create the question answering pipeline:

Seamlessly Ask Questions, Receive Answers with Precision

To simplify the question answering process, we create a pipeline using the loaded model and tokenizer:

nlp = pipeline('question-answering', model=model, tokenizer=tokenizer)

The pipeline function provides a convenient way to perform various NLP tasks, including question answering. We pass in the model and tokenizer that we previously loaded as arguments.

With the pipeline in place, we can easily retrieve answers by passing the question and context information to the pipeline’s __call__ method:

res = nlp(QA_input)

The QA_input dictionary contains the question and context information required for question answering. The pipeline processes this input and returns a dictionary containing the predicted answer and additional metadata, such as the confidence score.

The results generated are accurate in context of the queries

By creating the question answering pipeline, we enable our application to process user queries and provide accurate answers based on the input. The pipeline handles all the necessary steps, such as tokenization, encoding, and prediction, making it straightforward to use without the need for additional coding.

The question answering pipeline takes an input dictionary containing the question and context information. It applies the loaded model and tokenizer to generate predictions and extract the most relevant answer.

User Input and Question Answering

Now, we handle the user input and perform the question answering process:

question_input = st.text_input("Question:")

if question_input:
keywords = question_input.split()

wikipedia = WikipediaAPIWrapper()
context_input = wikipedia.run(' '.join(keywords))

QA_input = {
'question': question_input,
'context': context_input
}

res = nlp(QA_input)

st.text_area("Answer:", res['answer'])
st.write("Score:", res['score'])

The user’s query is obtained as input using st.text_input. In the event that a query is given, we break it down into keywords before using the Langchain WikipediaAPIWrapper to retrieve context data from Wikipedia based on the keywords. Then, by developing a dictionary called QA_input, we have the question and context ready for question answering.

We provide QA_input through the question-answering pipeline nlp in order to extract the solution and grade. Finally, we use st.text_area to display the response and st.write to display the score.

Demonstration

The demonstration of this programme is shown below :

The starting UI looks like this :

The Welcome UI page

Once u search for certain results in the search bar you can see the result generated and the accuracy.

The result generated for the query

Final window setup

The final window view

Conclusion

In this article, we have explored how to build an interactive question answering app using Streamlit, Transformers, and Langchain WikipediaAPIWrapper. We walked through the code and explained each step involved in the process.

Byee!!

Here the repo : Click here

The App : Click me

With this app, users can input their questions and receive answers based on the provided information, making it a useful tool for information retrieval and knowledge exploration.

Hope you enjoyed reading this article and learned something!!
Thanks for reading 😊👍

--

--

Anoop Johny

Masters Life Science Informatics student at Bonn University