N-shot Learning

Vlad Rișcuția
14 min readAug 8, 2023

--

This is an excerpt from Chapter 4: Learning and Tuning from my book Large Language Models at Work. The book is now available on Amazon: a.co/d/4MiwZvX.

Large language models are huge neural networks with billions of parameters, that are very expensive to train. Training such a model is a big undertaking. As a reminder, GPT stands for Generative Pre-trained Transformer. Note the “pre-trained”.

That said, we can teach an old model new tricks. This article is all about how a large model can learn. The learning we’ll cover is different than training as it is used in the context of neural networks. We won’t be starting from scratch and attempting to train the billions of parameters in a model. Instead, we will start with an already trained model and see what we can do from there.

First, there is zero-shot learning. This means the model might be able to perform a task it was not specifically trained for. We prompt it to do something, and it does this by nature of its knowledge of language and relationship between words. We’ll see a couple of examples of this and what it means.

Next, we have one-shot learning. With one-shot learning, we provide a single example. Once given this example, the model can then accomplish the task.

Sometimes the scenario is more complex, and one-shot learning is not enough. In this case, we have few-shot learning: this means providing several examples from which the model can derive (learn) what it is supposed to reply.

Zero-shot learning

Let’s start with an example: a large language model trained on a wide range of texts can perform a translation task from one language to another without ever having been explicitly trained on that particular translation pair. The model can be given a prompt in the source language, along with a description of the target language, and it will generate a translation based on its understanding of language.

{
"messages": [
{ "role": "system", "content": "You are an English to French translator." },
{ "role": "user", "content": "Translate this to French: {{text}}" }
]
}

English-to-French prompt template.

We’ll be using gpt-3.5-turbo, the chat completion model. This template contains a system message, telling the model to act as a translator, and a prompt to translate text to French. The actual text is replaced at runtime.

We’ll implement a ChatCompletion class to wrap the OpenAI API calls which we’ll use in further code samples.

import copy
import json
import openai
import os
import re

openai.api_key = os.environ['OPENAI_API_KEY']
if openai.api_key is None:
raise Exception('OPENAI_API_KEY not set')

def insert_params(string, **kwargs):
pattern = r"{{(.*?)}}"
matches = re.findall(pattern, string)
for match in matches:
replacement = kwargs.get(match.strip())
if replacement is not None:
string = string.replace("{{" + match + "}}", replacement)
return string

class ChatTemplate:
def __init__(self, template):
self.template = template

def from_file(template_file):
with open(template_file, 'r') as f:
template = json.load(f)
return ChatTemplate(template)

def completion(self, parameters):
instance = copy.deepcopy(self.template)
for item in instance['messages']:
item['content'] = insert_params(item['content'], **parameters)
return openai.ChatCompletion.create(
model='gpt-3.5-turbo',
**instance)

Here is how we can use this to make an OpenAI API call.

response = ChatTemplate.from_file(
'translate.json').completion({'text': "Aren't large language models amazing?"})

print(response.choices[0].message.content)

English-to-French translation using chat completion.

We instantiate a ChatTemplate from the template file we defined earlier and call the completion() function, setting the text to “Aren’t large language models amazing?”. Running this code should print something like “Les grands modèles de langage ne sont-ils pas incroyables?”

So how is this zero-shot learning? Well, gpt-3.5-turbo was not specifically trained for English-to-French translation. Historically, natural language processing models designed for translation are trained for the task. In our case, we would train a model on English-to-French translations. This model has not been trained for this specific task, but it can still perform it very well when prompted to do so.

Definition: zero-shot learning refers to the ability of a language model to perform a task without any explicit training data or examples for that task. Instead, the model can use its knowledge of language and the relationships between words to perform the task based on a textual description or prompt.

Another example of zero-shot learning is the ability of large language models to answer questions. These models are not trained specifically for Q&A, rather a large corpus of data, large enough that it can both understand the semantics of a question and rely on the data it has been trained on to answer it.

Note since the models are not trained for Q&A specifically, rather just to predict the most likely next word and the one after that and so on, they are prone to hallucinating (making things up), especially when they don’t have an answer handy.

Zero-shot learning is impressive from the perspective of what models can achieve — without us having to do anything, we see large language models exhibit some powerful capabilities. On the other hand, zero-shot learning means we literally don’t have to do anything, just ask. There are some limits to what can be done with zero-shot learning. At some point, we need to show some examples. Enter one-shot learning.

One-shot learning

With one-shot learning, we provide the large language model with a single example, which helps it better understand our prompt.

Definition: One-shot learning refers to the ability of a large language model to learn a new task or solve a new problem from only a single example.

In other words, instead of requiring a large dataset of labeled examples to train on, a language model with one-shot learning capabilities can quickly adapt to a new task or context with minimal training data.

This is done through transfer learning, where the model leverages its prior knowledge and pre-trained representations to quickly learn new concepts or patterns. By doing so, the model can generalize its understanding to new situations, even when faced with limited data.

Of course, there is not a lot we can teach with a single example. On the other hand, sometimes it is easier to show an example rather than trying to describe what we want. For example, here’s a prompt for rewriting text in a different style.

{
"messages": [
{ "role": "user", "content": "Here is some sample text: {{sample}} Rewrite the following text in the same style as the sample: {{text}}" }
]
}

Text rewriting template.

In this example, we provide some sample text and ask the model to rewrite another text in the same style as the sample. Let’s save this as rewrite.json. We could’ve provide a detailed description of the style instead, but that would likely require more prompt tuning as we ourselves would have to understand and explain the subtle elements of what a writing style entails. Rather than doing that, we simply show the text in the style we want and let the large language model figure out what it means.

Let’s put this to work.

response = ChatTemplate.from_file('rewrite.json').completion({
'sample': '''
Deep in the shady sadness of a vale
Far sunken from the healthy breath of morn,
Far from the fiery noon, and eve's one star,
Sat gray-hair'd Saturn, quiet as a stone,
Still as the silence round about his lair;
Forest on forest hung about his head
Like cloud on cloud. No stir of air was there,
Not so much life as on a summer's day
Robs not one light seed from the feather'd grass,
But where the dead leaf fell, there did it rest.
A stream went voiceless by, still deadened more
By reason of his fallen divinity
Spreading a shade: the Naiad 'mid her reeds
Press'd her cold finger closer to her lips.''',
'text': "Let's start with an example: a large language model trained \
on a wide range of texts can perform a translation task from one language to \
another without ever having been explicitly trained on that particular \
translation pair. The model can be given a prompt in the source language, \
along with a description of the target language, and it will generate a \
translation based on its understanding of language."})

print(response.choices[0].message.content)

Rewriting chat completion.

We provide as sample a few lines from John Keats’ Hyperion. The text is a paragraph from earlier in this chapter. Let’s see how Keats would explain zero-shot language translation. Here is a possible reply:

Deep in the vast expanse of knowledge,
Far removed from the familiar words of men,
Far from the common phrases, and dialects of foreign lands,
Resides the mighty language model, still as a statue,
Silent, amidst the stillness of its lair;
Knowledge upon knowledge stored within its head,
A library of texts that stretch beyond the sky.
No disturbance from the outer world can penetrate,
The silence, unbroken, like a winter's night,
But when prompted in one tongue, and guided to another,
It will weave language, a master of communication,
Based on its vast understanding of the spoken word.

One-shot rewrite by large language model.

Admittedly, Keats is a famous poet, so we could’ve asked the model to rewrite in the style of John Keats, but this example demonstrates how we can sample some text and have the model adopt the style with a simple example.

Another good use-case for one-shot learning is to format of the output. If we have some very specific formatting requirements, showing an example rather than describing the format makes for simpler prompts.

Here’s an example of us wanting an XML reply when prompting the model for facts about some entity.

{
"messages": [
{ "role": "user", "content": "Give me some facts about Saturn" },
{ "role": "assistant", "content": "<FACTS entity=\"Saturn\"><FACT>Named after the Roman god of agriculture and harvest</FACT><FACT>At least 82 known moons</FACT></FACTS>" },
{ "role": "user", "content": "Give me some facts about {{entity}}"}
]
}

One-shot formatting template.

We expect the answer to be contained within the <FACTS> element. This element contains an entity attribute specifying the target entity. We then want each fact to show up inside <FACT> elements. In our prompt, we provide an example of this as part of the chat conversation – we show an example question (facts about Saturn) and the expected response for that question. We then repeat the question but for a different entity.

We can use this to get formatted facts from a large language model.

response = ChatTemplate.from_file(
'xml.json').completion({'entity': 'Elon Musk'})

print(response.choices[0].message.content)

Formatted output chat completion.

In this case we are asking for facts about Elon Musk. Running this code, I got the reply in the following listing (your milage may vary as models are non-deterministic):

<FACTS entity="Elon Musk"><FACT>CEO of SpaceX, Tesla, Neuralink, and The Boring
Company</FACT><FACT>Net worth of over $200 billion as of
2021</FACT><FACT>Founded PayPal</FACT><FACT>Has a vision to colonize Mars and
make humanity a multi-planetary species</FACT><FACT>Has publicly stated concerns
about the potential dangers of artificial intelligence</FACT></FACTS>

One-shot formatting by large language model.

The model was able to understand the format we wanted, and the response conforms to the schema.

In general, consider using one-shot learning when it’s easier to show than tell. If it takes fewer tokens to give an example or the example is clearer than a description, the large language model should be able to easily understand it. We saw it rewrite text in the style of John Keats from a few lines of poetry, and we saw it output formatted XML from a single example of a similar question/answer.

Of course, there is only so much we can convey with one example. Sometimes, we need to provide a set of examples for the model to understand our goals.

Few-shot learning

You can think of few-shot learning as an extended version of one-shot learning — instead of giving a single example to the large language model, we provide several examples.

Definition: Few-shot learning refers to the ability of a large language model to learn a new task or solve a new problem from a small set of examples.

In some cases, a single example might not provide enough information to give us the desired result. We might end up here as we tune our prompt — we first attempt a zero-shot prompt, but we don’t get back quite what we were expecting. We then try a one-shot prompt, but even that misses the mark. In that case, we can try providing multiple examples and hopefully the model can better infer what we want.

The following figure illustrates few-shot learning.

Few-shot learning.

We take the user input and combine it with a set of examples to compose the prompt we send to the large language model.

Text adventure

An example of when we would want to use few-shot learning is when we want to “teach” the model to act as a text adventure game. We’ll show the model a couple of user actions and responses (e.g. for the user action Look around, the response is You are in a room with a table and a door). Given a few such examples, the model should be able to “play” with us.

We can implement a simple command-line interactive chat to act as our adventure game by pre-seeding the chat history with a few-shot learning examples of how we want the model to reply.

chat = ChatTemplate(
{'messages': [
{'role': 'system', 'content': 'You are a text adventure game.'},
{'role': 'user', 'content': 'Look around'},
{'role': 'assistant', 'content': 'You are in a room with a table and a door.'},
{'role': 'user', 'content': 'Open door'},
{'role': 'assistant', 'content': 'The door is locked.'},
{'role': 'user', 'content': 'Check inventory'},
{'role': 'assistant', 'content': 'You have a sandwich.'}]})

for message in chat.template['messages']:
if message['role'] == 'assistant':
print(f'{message["content"]}')

while True:
prompt = input('> ')
if prompt == 'exit':
break

chat.template['messages'].append({'role': 'user', 'content': prompt})
message = chat.completion({}).choices[0].message
print(message.content)
chat.template['messages'].append(
{'role': message.role, 'content': message.content})

Large language model-based text adventure.

Our chat history is pre-seeded with several examples of user actions and responses. Before we enter the chat loop, we print the assistant messages to give some context to the user.

If we run this code we can continue “playing” and getting realistic responses from the model. Below are a few actions and replies we could get by running the code:

You are in a room with a table and a door.
The door is locked.
You have a sandwich.
> What is on the table?
You see a key lying on the table.
> I use the key to open the door
You use the key to unlock the door and open it. You find yourself in a dark hallway. There are doors on either side of the hallway and a staircase leading up.
> I eat the sandwich
You eat the sandwich and feel slightly less hungry.

Example actions and responses.

The few-shot in this case are the user and assistant messages we put in thehistory. Note we’re also starting with a system message providing some context to the large language model. This is pretty impressive! When text adventures were popular, in the early days of PCs, developers had to hand-craft each and every interaction. With the help of large language models, we can now achieve similar results with just a few examples.

This was an example of a generative scenario. We prime the model to generate the type of responses we would expect from a text adventure game. Let’s look at another example in another domain: sentiment analysis.

Sentiment analysis

Large language models have extremely powerful language skills, so we can leverage these for sentiment analysis. Say we want to evaluate the sentiment of a restaurant review, on a scale from 1 to 5 (and also get some explanation of why the model assigned the score it did). We can use few-shot learning for this as shown below (we can save this as sentiment.json):

{
"messages": [
{ "role": "system", "content": "You are an assistant designed to analyze sentiment of customer reviews. For a given review, you will respond with an assessment of the review. Rate on a scale of 1-5 (5 being highest). Explain why this rating was given" },
{ "role": "user", "content": "We've been here several times, and every time it wows us. Everything I've ordered I would order again." },
{ "role": "assistant", "content": "5: Short review in which the customer praises the food and mentions being a return customer." },
{ "role": "user", "content": "Love the food and have constantly enjoyed their sushi this last year. However, it's been too often that we've waited 45-60 minutes for our food. They take so many to go orders that they neglect their in-house patrons." },
{ "role": "assistant", "content": "2: The customer really enjoys the food but is dissatisfied by the long wait. The fact that the restaurant prioritizes to-go order makes them feel neglected." },
{ "role": "user", "content": "I loooove the sushi here! I've ordered from here multiple times and have dined in as well. Both are always great experiences and the sushi is always so yummy." },
{ "role": "assistant", "content": "5: The customer both dined in and ordered to go multiple times, had a great experience each time, and really enjoys the food." },
{ "role": "user", "content": "{{review}}"}
]
}

Few-shot sentiment analysis template.

We first use the system message to tell the model how we want it to reply. We then provide a set of examples, where the user message contains the review text and the assistant message contains the response, consisting of a score and an explanation derived from the review text.

We can use this template with our interactive chat:

chat = ChatTemplate.from_file('sentiment.json')

while True:
prompt = input('> ')
if prompt == 'exit':
break
message = chat.completion({'review': prompt}).choices[0].message
print(message.content)

Sentiment analysis for restaurant reviews.

For this particular scenario we don’t need chat history, as we expect to score reviews one by one as they are provided. We simply load the template from sentiment.json and on each call replace the {{review}} parameter with the actual review. We print the response we get back. Below is an example of this interaction:

> The food was fantastic but the kitchen was slowwwwwwwww and super disorganized. It seemed like they prioritized delivery orders over diners and it was really disappointing. But for the sushi and the prices this place was excellent!!
3: The customer really enjoyed the food and the prices, but is disappointed by the slow and disorganized kitchen. It seems like delivery orders are prioritized over diners, which is frustrating.

Example sentiment analysis.

We can see the model provides an accurate assessment of the review, understanding the pros and cons and estimating the sentiment value.

Using few-shot learning, we can enable many scenarios across different domains. Feeding a small set of examples to the large language model makes it understand precisely how we want to use it and how it should format its response.

For more complex scenarios, we can combine few-shot learning with prompt selection. Depending on the ask, we can first apply prompt selection to identify which examples we want to inject into the prompt and pick these from a broader range of possible few-shot examples. The following figure shows how this would work.

Prompt selection and few-shot learning.

  1. We have a selection prompt template and use the large language model to process the original user ask and determine which template we should use.
  2. In this case, the templates we are selecting from are few-shot learning templates containing sets of examples for specific scenarios. We pick the right template based on the selection prompt, then use that combined with the user ask to generate the final prompt we send to the model.

We won’t go over an implementation as we already have all the building blocks and putting them together will take too much space.

Recap

We covered zero-shot, one-shot, and few-shot learning:

  • Zero-shot learning pretty much means the model can perform a task it was not specifically trained for. You can start here when engineering your prompt — maybe the pre-trained model has enough knowledge to provide a good answer to your prompt without additional examples. One example we looked at was translation between two languages.
  • One-shot learning means providing one example to the prompt. This comes in handy when it is easier to show than to tell — it can be used to both add context to the input prompt (our example was giving some Keats verses and asking the model to rewrite some text in that style) or to describe the output (we got the model to output in a custom XML format just by showing an example).
  • Few-shot learning comes into play when a single example is not sufficient. We provide the model with a set of examples that help it refine its reply.

The full chapter includes a discussion on fine-tuning as another example of “teaching” pretrained models on new data. The book is now available on Amazon: a.co/d/4MiwZvX.

--

--