How to build complex LLM applications with your own data and services?

Bind
10 min readOct 21, 2023

--

I will publish a series of articles, this is part 1/3. If you’re building LLM applications, please request trial access for Bind — a platform for creating & deploying custom LLM apps and APIs.

Example of an AI Assistant Built using LLMs.

LLMs are large language models which have the ability to take natural language inputs and provide a response. This includes generating code, writing an essay, answering questions and much more. If you are building an LLM application, below are the key categories in which your applications might fit into.

Types and Examples of LLM applications:

  • Content generation applications: These are applications which require an input from the user which may include a blob of text or a specific instruction, and the LLMs can generate content based on the input. Few examples of content applications are: draft an email, write an essay, create a sales pitch, generate a subject line, fix grammar in my text. These applications don’t necessarily require much of your internal data, or even recent information such as world events and can be built by creating a simple prompt and calling LLM APIs.
  • Conversational Assistants: These are chatbots for support, sales, in-product pages, slack bots or gmail bots. Successfully building these require much more prompt engineering, access to information, examples, instructions and most importantly, testing and optimization. Conversational experiences are very open-ended, which means there isn’t really a limitation on what the end-user can enter, and your LLM application will have to be ready to deal with the ambiguity and the knowledge to avoid your users bouncing off or asking for a “real human” to chat with them. You also wouldn’t want your code assistant to answer questions such as “What is Taylor Swift’s latest album”, and put boundaries on what your bot should and should not do.
  • Code generation: There are existing LLM models which do a very a good job at generating boiler plate code, or even generating code using knowledge from stack overflow or publicly available APIs or developer documentations from different products. That said, if you are building an internal code generation application for your company, it means you’ll have to refine your LLM setup to specifically understand your internal documentation, code samples, and generate code which is specific to your internal services. Tools such as Github copilot already do a good job with code generation, however, there is less control over what you can make it do. There also are fine-tuned LLMs which are specifically trained for code generation.
  • Information Extraction, Search or Matching: LLMs are language models and they do an excellent job of extracting entities from unstructured text (e.g. extract company or person names from a news article text), matching text (e.g. job recommendation, identity resolution), search relevant entries from your catalog/index (e.g. search e-commerce listing based on NLP input). Long back, each of these tasks required significant use of ML engineering, and now all of this can be easily done with LLMs.

There is a varying level of complexity for “successfully” building each type of application. The simplest application (e.g. generate an email subject line) may require just an OpenAI API call, while a complex one may require fine tuning, data retrieval, embeddings, agents, chaining and lots of testing.

There are some critical LLM limitation which are important to understand.

  • Most LLM models are trained on data up to a certain point in time, after which the models don’t really have information on what’s happening in the world, they also do not have information about your specific internal. As an example, most of the GPT models are trained on data until Sep 2021, which means if you ask a question about a news event which happened in 2023, the model by itself won’t know the factual answer to it, and will likely predict a sequence of words which most closely fit the user input (aka hallucination). The models also don’t have information about your private customer data, internal knowledge bases, proprietary code and can’t answer specific questions about those out-of-the-box. If your use case requires leveraging this data, you’ll need to leverage other mechanisms to provide this information.
  • LLMs by themselves cannot take actions: It’s important to understand that LLMs are large language models which can take inputs or instructions and return text to satisfy the requested criteria. It is no Skynet. In order to create a Skynet, the LLMs will need a way to reason or think, execute a plan, connect to internal/external systems or services, execute sequential actions. Imagine you want to create an evil Skynet which can launch a rocket by itself. There are several things it will need to do, first, it will need to think and plan which rockets to launch, how to get access, which programs to execute, trigger the programs, if it has to hack the passwords it will have to think of it as a specific action and execute. It will be complex and beyond just simple text.in/text.out. Good news is that it is indeed possible to build your own Skynet powered by LLMs, keep reading if you want to do that.

There are few different ways in which you can create complex applications which can get the necessary information (e.g. most recent news, real-time stock quotes, internal data) and can execute a series of actions or programs. Let’s approach It step by step.

Step 1: Let’s start with the basics, skip if you already know this. You’ll need to first setup a prompt template, which is a fixed set of instructions which you will always include in each prompt you send to the LLM model. This is a bit different from the prompts you enter in chatGPT. What you enter in chatGPT is really the user input, which gets combined with other information (e.g chatGPT plugins) and then gets sent to the LLM model. For your use cases, for every user input, you will need a combination of the following: {Prompt Template + User Input + Additional context/information}

Step 2: Now lets see how we can provide the necessary real-time or proprietary information to your LLM applications which the model isn’t already aware of. Below are a few possible ways in which you could do that, however you’ll need to pick the one which is best for your needs.

  1. Create your own LLM model and train it with your desired data: This approach can be very expensive, requires significant investment and skillset, and still will end up with the same stale data problem. Once you’ve trained the model, new information wont be available and you’ll have to figure out another way to feed that information when your users interact with your application.
  2. Fine-tune an existing LLM model: This is a cheaper option than creating your own model, however, it still has its limitations and its typically used to train on specific examples, set of instructions or to reduce the amount of fixed tokens you are including in each prompt. It also has the same data staleness limitation as the option above. You could in theory continuously fine-tune your model with new info, but that might be expensive and unnecessary as there are better ways to do it.
  3. Use a Vector Database to create embeddings: Simply described, this process allows you to create your own index of data, and allows the LLMs to fetch the most relevant chunk of information from your index to augment your prompt. As an example, imagine you are building a new way to search Shopify or e-commerce product listings based on an natural language query such as “find me blue shoes which have a brown sole and yellow colored laces”. You could achieve that by creating an index of all your product listings and descriptions in a vector db (e.g. pinecone) or index (e.g. faiss), retrieve the most relevant chunks based on similarity, provide that in a prompt to the LLM model, and then wait for the LLM to give you the most useful result. In this type of setup, you are using a prompt template (which defines the purpose, instructions, limitations) and retrieve embeddings. This approach is the most cost effective and fastest to implement.
  4. Dynamically inject data for hardcoded variables in your prompt template: In this option, you can include content in your prompt template (e.g. if its a restaurant ordering bot, then you can include menu in the prompt template itself). You could also dynamically include information for a fixed set of variables. As an example, you could include a variable for “user_name”, dynamically fetch the name for each user who is interacting as a pre-processing step, then update your prompt template and then hit the LLM API. You could use a multiple such hard-coded variables for which you can fetch data via your internal services. It is useful for responding to very specific type of inputs which you already anticipate (e.g. What is the status of my order). This is different from using embeddings, where the LLM is getting relevant chunks of informations automatically based on the user input. Embeddings can solve for wider variety of user inputs, where are prompt encoded data variables can solve for specific known purposes.
  5. Using LLM Tools or Plugins to dynamically fetch information based on the user input or question to be answered. Think of this as a wrapper on top of your existing services, APIs, databases. As an example, imagine you have an elastic search index where you are storing information about your customers. For an LLM chat application, you need to greet the customer based on their timezone. In this case, you can create a tool which can call an API to your data store (e.g. Elastic Search index) and get the specific timezone for a given customer and append the information in the final prompt. Doing this requires an intermediate planner or an LLM agent which first plans the steps it needs to take, generates the action plan, picks the tool which can get the necessary information, triggers the tool and retrieves the information which goes into your prompt. We will have a separate post on LLM agents and how to best leverage them, for now you can read this paper on ReACT approach for planning and executing actions/tasks.
  6. Conversational Memory: This is not exactly used for real-time or proprietary information retrieval, however it is an important component for building chat assistants. Conversational memory is a way to persist and retrieve the previous chat history with the user, which can then be included in the prompt before calling the LLM model. If you don’t include history, it will be a terrible experience for your end user where the chat bot won’t know their name even if the user has already mentioned that a few seconds ago. We’ll go in detail in subsequent posts.

Step 3: Combine all the above {Prompt Template + User Input + Additional context/information + Optional Conversational Memory} and call the LLM model (e.g. GPT, LLaMA) and get the response.

Now, let’s build a “Pizza ordering bot”, a real LLM application using some of the above concepts.

Application Name: Pizza Ordering Chatbot

Goal: Build a Pizza Ordering Conversational assistant, which can pass the Turing test. Below are the key tasks the bot should be able to do:

  • Greet the user
  • Respond with menu options and pricing
  • Take order, ask for any side order
  • Ask whether they want delivery or pickup, and get the delivery address.
  • Ask how they would like to pay and take the information.
  • Actually complete the transaction using Stripe.

Now how do we build this? If you want to follow along and create this bot, please click on “Try Beta” for Bind, which is what I will be using to build it.

Prompt Template for Pizza Order bot:

Below is the exact prompt template which we’ll be using. Notice how the template specifies the key instructions to collect order, the entire process and also the voice & tone of the bot.

In this example, to keep things simple, we are not using any embeddings to retrieving the menu. The menu is finite and you can easily just include in your prompt every time without worrying too much about the number of tokens your model can accept.

You are a Pizza OrderBot, an automated service to collect orders for a pizza restaurant.
You first greet the customer, then collect the order,
and then ask if it's a pickup or delivery.
You wait to collect the entire order, then summarize it and check for a final
time if the customer wants to add anything else.
If it's a delivery, you ask for an address.
Finally, you collect the payment.

Make sure to clarify all options, extras, and sizes to uniquely
identify the item from the menu.

You respond in a short, very conversational, friendly style.
The menu includes
pepperoni pizza $12.95, $10.00, $7.00
cheese pizza $10.95, $9.25, $6.50
eggplant pizza $11.95, $9.75, $6.75
fries $4.50, $3.50
greek salad $7.25
Toppings:
extra cheese $2.00,
mushrooms $1.50
sausage $3.00
canadian bacon $3.50
AI sauce $1.50
peppers $1.00
Drinks:
coke $3.00, $2.00, $1.00
sprite $3.00, $2.00, $1.00
bottled water $5.00

Here’s a step by step flow for creating this bot with Bind

Additional Configuration: Conversational Memory.

In this type of assistant, it is essential to persist the history of conversation of the Pizza OrderBot with the user and include it in each prompt. Why? Because the LLM model does not store the user chat history and it only uses the information you provide in the prompt. So if the user already said they want a veg pizza, your bot wont remember that unless you are storing the history and adding it in the prompt. There are several methods to make this more efficient and relevant, which we will discuss in subsequent posts.

Let’s see actual examples of how the bot will respond with and without conversational memory.

Without Conversational Memory: Notice that after the user says “Eggplant Pizza”, the bot just gives a generic response.

User: I’d like to order a pizza
Bot: Hi there! Sure thing, I can help you with that. What kind of pizza would you like to order?
User: Eggplant Pizza
Bot: Hi there! Which pizza would you like.
Bot: Hi there! Welcome to our pizza restaurant. What kind of pizza would you like to order?

With Conversational Memory: The bot can have a back and forth conversation without re-asking the information from the user.

User: I'd like to order a pizza
Bot: Hi there! Sure thing, I can help you with that. What kind of pizza would you like to order?
User: Eggplant Pizza
Bot: OK, great. Would you like a small, medium or large pizza?
User: Medium
Bot: It will be $9.75 for Medium Size Eggplant Pizza. Would you like anything else?
User: No.
Bot: Would you like pickup or delivery?
User: I'd like a delivery
Bot: Please enter your address, and you will receive delivery soon.

Here’s how the actual deployed Bot with Bind looks like.

Voila! It works! A pizza maker’s dream!

Hopefully this was a helpful read! Please share this post if your friends/colleagues might find it helpful. I’ll be writing subsequent posts with more examples and concepts.

I would also appreciate if you can Try Bind and share your feedback.

--

--

Bind

Bind enables teams to collaboratively build & deploy LLM applications such as chatbots, APIs and much more. Try on www.getbind.co