ChatGPT Integration 101: Custom Data Queries in Your AI App — Part 2

Tamilselvan Arjunan
3 min readFeb 11, 2024

--

ChatGPT open on a laptop.
Photo by Choong Deng Xiang on Unsplash

In part 1, we talked about the use cases of LLMs, created the basic structure of our project, and laid out the blueprint of our main.py file.

In this part, we will use LangChain to play around with simple use cases such as getting answers to a question from a PDF document.

To get started, we need to install LangChain. Activate the virtual environment and then install it.

pip install langchain

We will be writing the code for the llm functionality inllm_processing.py .

For this demo, since we will be interacting with OpenAI models, we need a third party wrapper library called langchain-openaito use ChatGPT via langchain.

Install langchain-openai .

pip install langchain-openai

To interact with the OpenAI API, we require an API key. It can be generated from here. We can start building the llm part of our app after your API key has been created.

In llm_processing.py :

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(openai_api_key="your_key" # add your API key here.
llm.invoke("How many carbon emissions were emitted in 2022?")

A few pointers:

  • ChatOpenAI refers to the ChatGPT model.
  • The invoke method is used to ask a question to the model.
  • Currently, the model will answer incorrectly since it has been trained on data till Jan 2022. To get an accurate answer, we will need to feed it custom data source.

We have 2 options to get around this issue.

  1. We can provide the text which might contain the answer directly to the model.
  2. We can use embeddings to find the most similar match of a text and use it to answer the question.

Let’s explore both the options one by one.

-Option 1: Provide the text directly to the model:

In llm_processing.py, :

from langchain_openai import ChatOpenAI
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.documents import Document

llm = ChatOpenAI(openai_api_key="your_key" # add your API key here.
prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

document_chain = create_stuff_documents_chain(llm, prompt)
document_chain.invoke({
"input": "How many carbon emissions were emitted in 2022?",
"context": [Document(page_content="37.15 billion metric tons CO2e were emitted in 2022")]
})

A couple of pointers here:

  • prompt is used as a template for passing into the llm. context and input are variables which can be passed into the template when document_chain.invoke() method is called.
  • create_stuff_documents_chain helps in using the prompt when asking a question to the llm.
  • document_chain.invoke() method sends a request to the OpenAI API to get a response based on the question asked and the context provided.
  • contextis the information that is needed by the model to accurately answer questions which are out of its training data or questions which are specific to a use case.

The API will give the correct response now since we have provided the information related to the answer of the question.

In the next part, we will explore how we can use an entire PDF instead of spoon feeding the answer to the llm to answer our question.

--

--

Tamilselvan Arjunan

Software developer, published many AI/ML python packages on PyPI. Featured in a Microsoft press release for "Pydatascraper".