ChatGPT LangChain Example for Chatbot Q&A

ChatGPT, LangChain, and FAISS — a transformative trio that simplifies chatbot creation

Published in

Sopmac AI

6 min readMay 6, 2023

In this article, we will demonstrate how ChatGPT, LangChain, and FAISS enable developers to build intelligent, context-aware chatbots with remarkable ease.

What is ChatGPT
What is LangChain
What is FAISS
Working code to save OpenAI embeddings as a FAISS index
Working code to load a FAISS index and begin chatting with your docs

ChatGPT

ChatGPT is an advanced language model developed by OpenAI that can generate human-like text based on given prompts, allowing for versatile applications like conversation, text summarization, and question-answering.

For the purposes of our demo, we will focus on OpenAI’s “gpt-3.5-turbo” model as it currently has the right combination of speed and pricing for chatbots.

For an in-depth pricing analysis on the ChatGPT API, check out:

GPT-4 API Pricing Analysis

GPT-4 for completions is 29x more expensive than the ChatGPT API

medium.com

LangChain

LangChain is a library (available in Python, JavaScript, or TypeScript) that provides a set of tools and utilities for working with language models, text embeddings, and text processing tasks. It streamlines tasks such as creating chatbots, handling document retrieval, and performing question-answering operations by combining various components, like language models, vector stores, and document loaders.

We will focus on creating a Q&A chatbot with a subset of the components (see green items above) available in the ever-growing LangChain library.

FAISS

FAISS (Facebook AI Similarity Search) is an open-source library developed by Facebook AI Research. It is designed to efficiently search for similar items (vectors) in large collections of high-dimensional data. FAISS provides methods for indexing and searching vectors, making it easier and faster to find the most similar items within a dataset.

It is particularly useful in tasks like:

Recommendation systems
Information retrieval
Clustering — where finding similar items is important

FAISS is a solid vector storage choice if you have a basic chatbot and are:

Querying a limited dataset that can be powered by a CPU
Seeking a free and open source vector storage solution
Not looking to introduce another server or cloud API into your architecture

For more on the concept of vectors and vector databases, check out:

Vector Databases as Memory for your AI Agents

LLM Persistence with Pinecone, Chroma, and LangChain

medium.com

…and now onto our Chatbot Q&A Demo with Python…

Save OpenAI Embeddings as a FAISS index

This code loads data from a CSV file, splits it into chunks, and then creates and saves a FAISS index with OpenAI Embeddings. This allows efficient similarity search and retrieval of relevant information from the dataset.

import dotenv
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders.csv_loader import CSVLoader

# loads .env file with your OPENAI_API_KEY
dotenv.load_dotenv()

# CSV from https://gist.github.com/IvanCampos/94576c9746be280cf5b64083c8ea5b4d
loader = CSVLoader("midjourney-20230505.csv", csv_args = {"delimiter": ','})
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

faissIndex = FAISS.from_documents(docs, OpenAIEmbeddings())
faissIndex.save_local("faiss_midjourney_docs")

Importing necessary libraries, including the dotenv library for managing environment variables, and other modules from the langchain library for handling text processing and creating a FAISS index.
Loading the .env file containing the OPENAI_API_KEY using the dotenv.load_dotenv() function. If your code is to be hosted on a Git repository, add the .env file to your .gitignore.
Creating a CSVLoader instance to load a CSV file named midjourney-20230505.csv. The CSV can be downloaded here.
Loading the documents from the CSV file using the loader.load() function.
Creating a CharacterTextSplitter instance to split the documents into smaller chunks with a maximum size of 1000 characters each and no overlap between chunks.
Splitting the documents into chunks using the text_splitter.split_documents(documents) function.
Creating a FAISS index from the document chunks, using the OpenAIEmbeddings() for vector representations of the text chunks.
Saving the created FAISS index to a local file named “faiss_midjourney_docs”. The index can then be reused for efficient similarity search tasks in the future…

Load a FAISS index & begin chatting with your docs

This code imports necessary libraries and initializes a chatbot using LangChain, FAISS, and ChatGPT via the GPT-3.5-turbo model. It loads a pre-built FAISS index for document search and sets up a RetrievalQA chain. A prompt template is defined to request succinct responses. Finally, the chatbot is executed with a query about the latest version (inferred to be Midjourney), and the response is printed.

import os, dotenv
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain import PromptTemplate

dotenv.load_dotenv()

chatbot = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(
        openai_api_key=os.getenv("OPENAI_API_KEY"),
        temperature=0, model_name="gpt-3.5-turbo", max_tokens=50
    ), 
    chain_type="stuff", 
    retriever=FAISS.load_local("faiss_midjourney_docs", OpenAIEmbeddings())
        .as_retriever(search_type="similarity", search_kwargs={"k":1})
)

template = """
respond as succinctly as possible. {query}?
"""

prompt = PromptTemplate(
    input_variables=["query"],
    template=template,
)

print(chatbot.run(
    prompt.format(query="what is --v")
))
# --v is a parameter used to specify a specific model version in Midjourney's AI image generation tool.

Import necessary libraries and modules, including os, dotenv, OpenAIEmbeddings, FAISS, ChatOpenAI, RetrievalQA, and PromptTemplate.
Load the environment variable (i.e. OPENAI_API_KEY) from your .env file using dotenv.
Initialize a ChatOpenAI instance with the GPT-3.5-turbo model, a temperature of 0, a maximum of 50 tokens for responses, and the OpenAI API key. Default temperture is 0.7 — setting the value to 0 will reduce the randomness of ChatGPT completions.
Load the pre-built FAISS index “faiss_midjourney_docs” using the OpenAIEmbeddings.
Set up a RetrievalQA chain with the ChatOpenAI instance, the FAISS index, and the search type and parameters. It is highly recommended to set search_type and search_kwargs — not doing so would be cost inefficient as all of the chunks in your vector store would be sent to the LLM. It’s also worth noting that the chain_type is “stuff” which attempts to stuff all of the chunks into the prompt as context to your LLM (i.e. ChatGPT).
Define a prompt template that includes the variable “query” and asks for succinct answers.
Create a PromptTemplate instance using the defined template.
Format the prompt with a query about something related to Midjourney.
Execute the chatbot with the formatted prompt question.
Print the chatbot’s answer.

Conclusion

Using ChatGPT, LangChain, and FAISS offers several benefits:

Streamlined development process: Combining these technologies makes it easier for developers to create, maintain, and optimize chatbots, saving time and money, and facilitating faster deployment.
Greater adaptability: By combining these technologies, chatbots can be more easily adapted and extended to cater to new domains, languages, and use cases, increasing their versatility and value.
Advanced query handling: By leveraging the strengths of ChatGPT, LangChain, and FAISS, chatbots can better understand and handle complex or ambiguous queries, leading to more accurate, relevant, and satisfying responses for users.

BONUS

If you are looking to take your chatbot to the next level, be sure to experiment with prompt engineering techniques to give your bot a persona that does not feel robotic. Click on the following to read about some tips:

Prompt Engineering Tips for ChatGPT

Are you ready to take your conversations to the next level?

medium.com

Resources

How to Improve Search with Conversational AI

OpenAI Embeddings: A Midjourney Documentation Case Study

medium.com

Create a Serverless Search Engine using the OpenAI Embeddings API

OpenAI’s Text Embedding Model on AWS Lambda

medium.com

LangChain Chat

Today we're excited to announce and showcase an open source chatbot specifically geared toward answering questions…

blog.langchain.dev

🦜️🔗 LangChain | 🦜️🔗 LangChain

LangChain is a framework for developing applications powered by language models. We believe that the most powerful and…

docs.langchain.com

GPT-4 API Reference Guide

Request/Response Schema, Python Examples, and the Why of using the GPT-4 API

medium.com

ChatGPT LangChain Example for Chatbot Q&A

ChatGPT, LangChain, and FAISS — a transformative trio that simplifies chatbot creation

Table of Contents

ChatGPT

GPT-4 API Pricing Analysis

GPT-4 for completions is 29x more expensive than the ChatGPT API

LangChain

FAISS

Vector Databases as Memory for your AI Agents

LLM Persistence with Pinecone, Chroma, and LangChain

Save OpenAI Embeddings as a FAISS index

Load a FAISS index & begin chatting with your docs

Conclusion

BONUS

Prompt Engineering Tips for ChatGPT

Are you ready to take your conversations to the next level?

Resources

How to Improve Search with Conversational AI

OpenAI Embeddings: A Midjourney Documentation Case Study

Create a Serverless Search Engine using the OpenAI Embeddings API

OpenAI’s Text Embedding Model on AWS Lambda

LangChain Chat

Today we're excited to announce and showcase an open source chatbot specifically geared toward answering questions…

🦜️🔗 LangChain | 🦜️🔗 LangChain

LangChain is a framework for developing applications powered by language models. We believe that the most powerful and…

GPT-4 API Reference Guide

Request/Response Schema, Python Examples, and the Why of using the GPT-4 API

Written by Ivan Campos