What’s Up Doc?

Accessing Knowledge Beyond the LLM

Danny DeRuntz

Published in

Duct Tape AI

5 min readMay 12, 2023

Part of a series on prototyping with generative AI

When you’re looking for answers from AI, there can be a couple of hurdles to cross. First, it’s very hard to know exactly where the AI is pulling the answer from. Second, AI simply doesn’t know everything. It also has a cutoff date and can’t tell you about events or information beyond that date.

You may have used Bing Chat and noticed that its responses often include links and citations from sources. It even manages to stay surprisingly current, like it’s always reading the latest news. They’re taking the clever conversation skills of ChatGPT, an AI model, and blending it with more traditional internet search. That’s where AI gets a lot more interseting.

Below, you can see a desktop app I made. You ask it questions, and the app uses a chain of AI services to find answers based on a document (and not it’s own knowledge or guesswork). The default setting lets you ask questions to a Sassy Terms & Conditions Document. As a bonus, you can click the title of the OpenAI Terms & Conditions and switch to a Super Relaxed Tylenol Fact Sheet (disclaimer embedded).

Occasionally the app is down due to OpenAI API limits! (I’m monitoring)

Replit is my new favorite tool. You can run the entire frontend and backend from one spot. I highly recommend you “fork” the app (make a copy) and tinker around under the hood. To do this you need an OpenAI API key. If you do, keep your eyes on the console to watch the “engine” working.

This demo is all about langchain, which lets you string lots of different LLM calls along with other relevant services. In this instance I started with the Conversational Retrieval QA example.

The Conversational Retrieval QA Chain

Think of this as a preplanned route for our chatbot to follow, much like a recipe. It uses the text document as its main ingredient to craft responses. Here’s how it works in a more digestible manner:

You ask the chatbot a question. It’s like asking a librarian for a book on a particular topic.
The chatbot looks through the content in our digital library (what we call a ‘vectorstore’), finding the most relevant section to your question.
The chatbot, using its instructions and the content it found, goes to the ‘LLM’ (a fancy term for our AI model) to craft a reply that includes a citation.
The AI model, gpt-3.5-turbo (fast & cheap), sends a response back to the chatbot.
The chatbot delivers this response to you.

Pretty straightforward. If the chatbot can’t find a quote, or can’t match it to exact text in the document in your browser, it should admit failure, labeling its reply as Speculation… Without Evidence.

I suppose that sounds complicated, but here is a version with all the fancy UI stripped out. This version only works on the OpenAI Terms & Conditions. In the app below, ask a question and watch how the langchain find and integrates relevant information from a text doc.

If the app is down, run this code below (but you need your OpenAI key) in a folder with a text doc (update “your_doc_name” to match).

import { OpenAI } from "langchain/llms/openai";
import { ConversationalRetrievalQAChain } from "langchain/chains";
import { BaseCallbackHandler } from "langchain/callbacks";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import * as fs from "fs";
import * as readline from 'readline';

const customPromptOpenAI = `Use the following pieces of context to answer the question at the end simulating a sassy document who likes red wine and always tells it like it is. If you don't know the answer, just say that you don't know, don't try to make up an answer. After your sassy answer, quote a citation that relates to the question and answer using the following format: Sassy Answer: <your answer>\n|||CITATION:\n\n"<document quote>"\n` +
  '\n' +
  '{context}\n' +
  '\n' +
  'Question: {question}\n' +
  'Sassy Answer:';

// OPENAI CONFIGURATION
const apiKey = process.env.OPENAI_KEY;
const embeddings = new OpenAIEmbeddings({
  openAIApiKey: apiKey
});
const model = new OpenAI({
  openAIApiKey: apiKey,
  modelName: "gpt-3.5-turbo",
  temperature: 0,
  verbose: true, // watch langchain in action in the log!
  streaming: true,
  callbacks: [
    {
      handleLLMNewToken(token) {
        if (token == "|||") {
          process.stdout.write('\n\n');
        } else {
          process.stdout.write(token);
        }
      },
    },
  ]
});

// CHAIN CONFIGURATION
const text = fs.readFileSync("terms_and_conditions.txt", "utf8");
const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000 });
const docs = await textSplitter.createDocuments([text]);
const vectorStore = await MemoryVectorStore.fromDocuments(docs, embeddings);
const chain = ConversationalRetrievalQAChain.fromLLM(
  model,
  vectorStore.asRetriever()
);
chain.combineDocumentsChain.llmChain.prompt.template = customPromptOpenAI;

export async function getSystemReply(message) {
  const question = message || "Do I hold the copyright of text written with ChatGPT?";
  const res = await chain.call({ question, chat_history: [] });
  const border = '\n---------------------------------------------------\n'
  const reply = '\n' + border + '\n' + res.text + '\n';
  console.log(reply);
}
const rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout
});
function promptQuestion() {
  rl.question('Type a question about OpenAI Terms & Conditions: ', async (answer) => {
    await getSystemReply(answer);
    promptQuestion();
  });
}
promptQuestion();

Replit has changed my life (also, test out Replit’s ghostwriter which is an AI helper that can help you with bugs and writing code). That’s all folks!

ChatGPT helped do a final revision of this post to remove insidery lingo. Helpful!

Abridged version: this post cost me $100 after a few hours due to the openAI usage it was enabling. I had to do quick surgery to save it:

OpenAI increased my API limit so I gave myself some more padding.
The app, had a needless second pass in the chain. It was illustrative, but it was much more efficient to do get as much done with half the prompts (incl. pulling a citation). This article reflects this simplification.
The embedding calls were pricey. $.09/5min. Bypassing that was’nt documented, but with my ChatGPT pal, we hacked the vector store. I saved out the raw vectors and then snuck them back in already “vectorized.” That caused 32,000 tokens/5min to become 10 tokens.
I caved and downgraded from gpt-4 to gpt-3.5-turbo which causes $.25+ to become $.01. Initially the model wasn’t well aligned. It struggled to reply in my specified format and had ZERO personality. So…
I stated the personality in the beginning and end, including replacing the adjective to the phrase “Helpful Answer.” For instance, “Helpful Answer” became “Sassy Answer.” When developing the latest Lotbot with GPT-3.5-turbo, I noticed that the personality would drift if there was a lot of following text in the prompts… unless you remind it right at the end. GPT-4 is better at staying on script, which let me get lazy.
Because of how the app stream tokens to the frontend, I couldn’t easily use a json response format (which 3.5 handles well). I found it hard to get back a consistent delimiter that would let me separate out the citation. In the end i had to use “cit” (1 token and not a word): “Sassy Answer: <your answer>\n\ncit “<document quote>”

The post and the functions are now a matter of a pennies/day (or less).

What’s Up Doc?

Accessing Knowledge Beyond the LLM

The Conversational Retrieval QA Chain

Written by Danny DeRuntz