AI Demystified: Part 1 - LLM Basics

Tarunaug
4 min readOct 27, 2023

--

Chat GPT, Bard, Bing Chat, and many more are all examples of Large Language Models (LLMs). We’ve seen them arrive with a bang, people are using the online tools for everything imaginable. The next phase in business is bringing them in to change the way we work. If used correctly LLMs can help businesses to improve customer service, generate marketing content, and develop new products and services. A key challenge here is “teaching” models about your business environment. To do that, its good to understand the basic concepts that dictate the way these models behave when deploying them yourself; which we’ll cover today.

Pre-training

At their core, LLMs work by predicting the next word in a sequence, based on the previous few words. This process is called n-gram language modeling.

LLMs are typically pre-trained on a massive dataset of text and code. This pre-training process allows the LLM to learn the basic patterns of language, such as grammar and syntax. This is a massively expensive exercise, costing millions of dollars. This isn’t accessible and is inefficient for most businesses, so the best way to adapt a model is to provide pre-trained and readily available models with your business context.

Tokens

The cost of running and training models is based on the premise of tokens. LLMs process text in units called tokens. A token can be a word, a character, or a sub-word. The number of tokens in a text is limited by the hardware and software that the LLM is running on. When using online models there is typically a token limit, and when deploying your own, the number of tokens can be increased but incurs an increased cost.

LLM Methods: Completion and Chat

There are two main methods in which we are currently using LLMs; completion and chat.

Completion

In completion mode, the LLM is given a prompt and is asked to complete it. For example, you could give the LLM the prompt “I love to eat” and the LLM would complete it with something like “ice cream” or “pizza.”

Completion mode is often used for tasks like generating creative text formats, such as poems, code, scripts, musical pieces, email, letters, etc.

Chat

In chat mode, the LLM is given a prompt and is asked to respond to it in a conversational way. For example, you could give the LLM the prompt “Hello, how are you today?” and the LLM would respond with something like “I am doing well, thank you for asking.”

Chat mode is often used for tasks like customer service and technical support.

Prompts

The prompt is the instruction that you give to the LLM. For example

Write me a poem.

Taking this further you could ask:

“Write me a poem about a friendly dog in the style of Shakespeare”

Making your model relevant: Context

Behind the scenes a model receives both a prompt and context for each call. Take the example above, if we worked at a company with a specific brand voice of Shakespeare and we wanted to generate poems. We’d hope to simply ask the model for a poem and it return us a poem in the style of Shakespeare.

Take this concept into a business and you may want to provide the context of business specific documents and knowledge. To do this behind every prompt you would pass the context of the your business as a knowledge base. This can be done in the form of a document, a set of documents or a database. A simple example of this in code is shown below.

import openai

# Set your OpenAI API key
openai.api_key = "YOUR_API_KEY"

# Load the document
with open("document.txt", "r") as f:
document = f.read()

# Define the chatcompletion prompt
prompt = "What does it say about jam in the document.txt"

# Call the OpenAI completion endpoint with the document as context
response = openai.Completion.create(
prompt=prompt,
engine="davinci",
max_tokens=100,
context=document
)

# Print the response
print(response.choices[0].text)

Context can also be passed by the response into the next call of the LLM, this is how you create a conversation flow that can relate back to the rest of the conversation.

Fine Tuning

The other method to provide context is fine tuning. This relies on a history of prompts and responses which you can train the model on. This works very well if you have “training data” available in the prompt and response structure, but cannot be taken much further. This is a more expensive and time-consuming process than providing context, but it can result in a more accurate and informative model.

The key challenge — Providing knowledge without blowing the budget.

The key limitation to all of the above approaches when it comes to building a “custom model” is the number of tokens you are allowed to pass into each call. Unless you have a massive computing budget available, training your own model is off the cards. There are ways to optimise your context and prompts accross a large knowledge base; the most common being RAG which I’ll cover in my next article.

With methods like RAG you can rely on unstructured data in the form of documents and transcripts to provide a base of knowledge your models can call on the fly.

Conclusion

Large Language Models (LLMs) have the potential to revolutionize the way we work. By understanding the basic concepts of LLMs, such as pre-training, tokens, methods, prompts, context, and fine tuning, businesses can start to use LLMs to improve their customer service, generate marketing content, and develop new products and services.

However, there is one key challenge: providing knowledge to LLMs without blowing the budget. LLMs are limited by the number of tokens they can process in each call, and training a custom model can be very expensive. In the next article, I will discuss how the RAG methodology can be used to overcome this challenge.

Stay tuned for more!

--

--