ChatGPT on just my stuff

Greg Hayworth
5 min readJan 21, 2023

ChatGPT is cool but I don’t want answers from the whole internet. I really just want it to give answers about my stuff. I don’t like when it makes up things that seem factual but don’t really exist. Is there anything like that?

The folks from LangChain in collaboration with Zahid Khawaja have recently released a demo which does exactly that. They have cleverly combined semantic search with specialized prompts to GPT-3 which mimic the user experience of ChatGPT, but the responses are anchored to only answer questions about LangChain’s own technical documentation.

You can check out their blog post here.

Semantic Search

The process starts with semantic search, which involves splitting documents into chunks, then representing those document chunks with embedding vectors. These vectors can be calculated in advance for your documents and stored in a vector store. Then when a user asks a question, that question is converted into a vector using the same embedding model. A semantic similarity search returns the document chunks that are most similar to the query.

Folks have been doing semantic search for a little while now using models like BERT to convert documents into embedding vectors. The semantic search is able to respond with the sentences or paragraphs that most likely contain the answer the user is seeking, but it doesn’t totally answer the question. Part of what makes folks so excited about ChatGPT is that it provides a more natural sounding answer to a question.

Large Language Model (LLM)

A large language model does something different than a semantic search. It takes a prompt, and produces a response based upon that prompt. A prompt can be a question but can also include more instructions than just the question. So, if I send a large language model a set of instructions, depending upon those instructions I can get very different behaviors. The same model can be used to summarize a technical article, answer questions or translate from English into German.

It is with a clever prompt that LangChain bends GPT-3 to its will. The prompt, LangChain uses in their demo is shown here:

You are an AI assistant for the open source library LangChain. The documentation is located at https://langchain.readthedocs.io.
You are given the following extracted parts of a long document and a question. Provide a conversational answer with a hyperlink to the documentation.
You should only use hyperlinks that are explicitly listed as a source in the context. Do NOT make up a hyperlink that is not listed.
If the question includes a request for code, provide a code block directly from the documentation.
If you don't know the answer, just say "Hmm, I'm not sure." Don't try to make up an answer.
If the question is not about LangChain, politely inform them that you are tuned to only answer questions about LangChain.
Question: {question}
=========
{context}
=========
Answer in Markdown:

When creating this prompt, at the most basic level the developers are just telling GPT-3 exactly what they want it to do. By telling the prompt that it should only use hyperlinks that are explicitly listed as a source and NOT to make up any hyperlink, they reduce the chances of hallucinations.

If you don’t know the answer, just say “Hmm, I’m not sure.” Don’t try to make up an answer.

This further reduces the chance of false positive results, that people don’t like from ChatGPT.

The user question will be placed where {question} is shown in this prompt. The results of the semantic search will be included in the {context}. The construction of this prompt tells GPT-3 to only provide answers that came from the context. Those answers will always have links that were selected from the context. Finally, the answer is formatted in Markdown so the developers can nicely display the response in their application.

Semantic search did part of the work, by only serving up the sections of the technical documentation that are relevant to the question at hand. Then GPT-3 finishes the job with a conversational style response which is explicitly limited to the context provided. This is how LangChain is able to deliver on the ChatGPT experience but limit the scope to only talk about their documents.

Follow-up Questions

An important and differentiating characteristic of the ChatGPT experience is the ability to ask follow-up questions. The memory of the chat history is taken into account when producing the next answer. The follow-up question experience is delivered in this app with another clever prompt. The developers send both the chat history and the new question to GPT-3 and prompt the LLM to generate a new single question. This new single question is then run through the process we have described previously. The prompt for this request looks like:

Given the following conversation and a follow up question, rephrase 
the follow up question to be a standalone question. You should assume
that the question is related to LangChain.

The prompt is telling the model exactly what we want it to do, and the LLM is able to generate a single question that represents the intent of the new question in the context of the history so far. So, the final workflow developed for this demo looks like the diagram below.

Further Consideration

Overall, this is a great demo that shows how folks can achieve that sweet spot delivering on the capabilities of that people love about ChatGPT while limiting the scope to a specific set of documents. This approach could be applied to many use cases in a corporate setting. There is one concern that I have with the design of the app when I think about it in the corporate context. Every question that my customers are asking about my product along with the corresponding information about my product are being sent to a 3rd party, in this case OpenAI.

At a fundamental level this just doesn’t sound like a good idea for someone else to have this dataset of all my customer’s questions about my product. A nefarious actor could potentially mine this data to exploit my weaknesses. A potential solution for is to host a fully opensource LLM (like Flan-T5 or GPT-2) on your own private cloud. Then at each step in the diagram where we call an LLM, we are calling our own private instance of that LLM. This approach means that you have to have enough engineering horsepower to host a large model on your cloud infrastructure, but that comes with the peace of mind that you are not sending your company data into the ether.

Conclusion

The combination of semantic search and LLMs show a lot of promise for many business applications. The framework developed by LangChain will make it easier to build those applications. So keep that library on your radar.

--

--