ChatGPT LangChain Example for Chatbot Q&A

ChatGPT, LangChain, and FAISS — a transformative trio that simplifies chatbot creation

Ivan Campos
Sopmac AI
6 min readMay 6, 2023

--

In this article, we will demonstrate how ChatGPT, LangChain, and FAISS enable developers to build intelligent, context-aware chatbots with remarkable ease.

Table of Contents

  • What is ChatGPT
  • What is LangChain
  • What is FAISS
  • Working code to save OpenAI embeddings as a FAISS index
  • Working code to load a FAISS index and begin chatting with your docs

ChatGPT

ChatGPT is an advanced language model developed by OpenAI that can generate human-like text based on given prompts, allowing for versatile applications like conversation, text summarization, and question-answering.

For the purposes of our demo, we will focus on OpenAI’s “gpt-3.5-turbo” model as it currently has the right combination of speed and pricing for chatbots.

For an in-depth pricing analysis on the ChatGPT API, check out:

LangChain

LangChain is a library (available in Python, JavaScript, or TypeScript) that provides a set of tools and utilities for working with language models, text embeddings, and text processing tasks. It streamlines tasks such as creating chatbots, handling document retrieval, and performing question-answering operations by combining various components, like language models, vector stores, and document loaders.

We will focus on creating a Q&A chatbot with a subset of the components (see green items above) available in the ever-growing LangChain library.

FAISS

FAISS (Facebook AI Similarity Search) is an open-source library developed by Facebook AI Research. It is designed to efficiently search for similar items (vectors) in large collections of high-dimensional data. FAISS provides methods for indexing and searching vectors, making it easier and faster to find the most similar items within a dataset.

It is particularly useful in tasks like:

  • Recommendation systems
  • Information retrieval
  • Clustering — where finding similar items is important

FAISS is a solid vector storage choice if you have a basic chatbot and are:

  • Querying a limited dataset that can be powered by a CPU
  • Seeking a free and open source vector storage solution
  • Not looking to introduce another server or cloud API into your architecture

For more on the concept of vectors and vector databases, check out:

…and now onto our Chatbot Q&A Demo with Python…

Save OpenAI Embeddings as a FAISS index

This code loads data from a CSV file, splits it into chunks, and then creates and saves a FAISS index with OpenAI Embeddings. This allows efficient similarity search and retrieval of relevant information from the dataset.

import dotenv
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders.csv_loader import CSVLoader

# loads .env file with your OPENAI_API_KEY
dotenv.load_dotenv()

# CSV from https://gist.github.com/IvanCampos/94576c9746be280cf5b64083c8ea5b4d
loader = CSVLoader("midjourney-20230505.csv", csv_args = {"delimiter": ','})
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

faissIndex = FAISS.from_documents(docs, OpenAIEmbeddings())
faissIndex.save_local("faiss_midjourney_docs")
  1. Importing necessary libraries, including the dotenv library for managing environment variables, and other modules from the langchain library for handling text processing and creating a FAISS index.
  2. Loading the .env file containing the OPENAI_API_KEY using the dotenv.load_dotenv() function. If your code is to be hosted on a Git repository, add the .env file to your .gitignore.
  3. Creating a CSVLoader instance to load a CSV file named midjourney-20230505.csv. The CSV can be downloaded here.
  4. Loading the documents from the CSV file using the loader.load() function.
  5. Creating a CharacterTextSplitter instance to split the documents into smaller chunks with a maximum size of 1000 characters each and no overlap between chunks.
  6. Splitting the documents into chunks using the text_splitter.split_documents(documents) function.
  7. Creating a FAISS index from the document chunks, using the OpenAIEmbeddings() for vector representations of the text chunks.
  8. Saving the created FAISS index to a local file named “faiss_midjourney_docs”. The index can then be reused for efficient similarity search tasks in the future…

Load a FAISS index & begin chatting with your docs

This code imports necessary libraries and initializes a chatbot using LangChain, FAISS, and ChatGPT via the GPT-3.5-turbo model. It loads a pre-built FAISS index for document search and sets up a RetrievalQA chain. A prompt template is defined to request succinct responses. Finally, the chatbot is executed with a query about the latest version (inferred to be Midjourney), and the response is printed.

import os, dotenv
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain import PromptTemplate

dotenv.load_dotenv()

chatbot = RetrievalQA.from_chain_type(
llm=ChatOpenAI(
openai_api_key=os.getenv("OPENAI_API_KEY"),
temperature=0, model_name="gpt-3.5-turbo", max_tokens=50
),
chain_type="stuff",
retriever=FAISS.load_local("faiss_midjourney_docs", OpenAIEmbeddings())
.as_retriever(search_type="similarity", search_kwargs={"k":1})
)

template = """
respond as succinctly as possible. {query}?
"""

prompt = PromptTemplate(
input_variables=["query"],
template=template,
)

print(chatbot.run(
prompt.format(query="what is --v")
))
# --v is a parameter used to specify a specific model version in Midjourney's AI image generation tool.
  1. Import necessary libraries and modules, including os, dotenv, OpenAIEmbeddings, FAISS, ChatOpenAI, RetrievalQA, and PromptTemplate.
  2. Load the environment variable (i.e. OPENAI_API_KEY) from your .env file using dotenv.
  3. Initialize a ChatOpenAI instance with the GPT-3.5-turbo model, a temperature of 0, a maximum of 50 tokens for responses, and the OpenAI API key. Default temperture is 0.7 — setting the value to 0 will reduce the randomness of ChatGPT completions.
  4. Load the pre-built FAISS index “faiss_midjourney_docs” using the OpenAIEmbeddings.
  5. Set up a RetrievalQA chain with the ChatOpenAI instance, the FAISS index, and the search type and parameters. It is highly recommended to set search_type and search_kwargs — not doing so would be cost inefficient as all of the chunks in your vector store would be sent to the LLM. It’s also worth noting that the chain_type is “stuff which attempts to stuff all of the chunks into the prompt as context to your LLM (i.e. ChatGPT).
  6. Define a prompt template that includes the variable “query” and asks for succinct answers.
  7. Create a PromptTemplate instance using the defined template.
  8. Format the prompt with a query about something related to Midjourney.
  9. Execute the chatbot with the formatted prompt question.
  10. Print the chatbot’s answer.

Conclusion

Using ChatGPT, LangChain, and FAISS offers several benefits:

  1. Streamlined development process: Combining these technologies makes it easier for developers to create, maintain, and optimize chatbots, saving time and money, and facilitating faster deployment.
  2. Greater adaptability: By combining these technologies, chatbots can be more easily adapted and extended to cater to new domains, languages, and use cases, increasing their versatility and value.
  3. Advanced query handling: By leveraging the strengths of ChatGPT, LangChain, and FAISS, chatbots can better understand and handle complex or ambiguous queries, leading to more accurate, relevant, and satisfying responses for users.

BONUS

If you are looking to take your chatbot to the next level, be sure to experiment with prompt engineering techniques to give your bot a persona that does not feel robotic. Click on the following to read about some tips:

Resources

--

--

Ivan Campos
Sopmac AI

Exploring the potential of AI to revolutionize the way we live and work. Join me in discovering the future of tech