A brief understanding of prompt engineering

10 min readSep 14, 2023

Large language models (LLMs) have totally changed the way humans solve problems. In previous years, if you wanted a computer to do any task (like reorganizing a document or categorizing a sentence), you would need to write a program. A program is a set of step-by-step instructions written in a programming language that the computer can understand. With LLMs, you no longer need programs to solve problems. All you need is some text explaining what you want the computer to do. For example, you can have an LLM rearrange any document just by giving it instructions in plain English or another language. You could say something like “Please reformat this document for me” and it would do it right away without complex programming. LLMs have made it much simpler to get computers to help us with different jobs and challenges.

Prompt engineering is researching different ways to structure instructions or questions (called prompts) to get language models like LLMs to perform tasks as well as possible. While there are many techniques, this overview will focus on the basics of how prompting works and some key methods like zero/few-shot learning and instruction prompting. Zero/few-shot learning means training the AI with very little or no examples. Instruction prompting means clearly telling the AI what to do. As we learn about prompting strategies, we’ll get practical tips we can use right away to become better at guiding LLMs. This overview won’t explain the history or inner workings of language models, since it’s all about prompting. To really understand language models before learning about prompting, I recommend checking out other introductions I’ve written. A strong background in how language models work is important for fully grasping how to prompt them effectively.

With all the hype around LLMs, we might wonder what really makes them so powerful. While there are many factors (like their large size, huge training data, feedback from people, etc.), a big strength is that LLMs are experts at predicting the next word or token in text. This means many different tasks can be solved by tuning how we provide input to and get output from the models!

To solve a task, all we need to do is 1) give the model text as input that includes relevant information and 2) get the answer from the text the model generates. This same approach can be used for translation, summarization, question answering, classification, and more. But the story isn’t totally simple. The exact words and structure used in the prompt (the input text) given to the LLM can hugely impact how accurate it is. In other words, how we phrase the prompt, or prompt engineering, is extremely important.

What is prompt engineering?

“Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use LMs for a wide variety of applications and research topics.” — Prompt Engineering Guide

Since coming up with the right contents in the prompt is crucial for getting helpful results from an LLM, prompt engineering has become very popular recently. However, it’s a science based on trial and error — figuring out the perfect prompts usually involves educated guessing and testing things out. We can improve our prompts by keeping track of different versions over time and trying new ideas to see what works best. By experimenting with prompts and comparing the outcomes, we can discover prompts that perform well without fully understanding why yet. It involves learning through experience rather than strict formulas.

There are different options for building a prompt. But most prompts include some common parts:

Input — This is the actual data the LLM will analyze, like a sentence to translate or document to summarize.

Examples — One good way to show the LLM what to do is by giving a few specific examples of inputs and their correct outputs in the prompt.

Instructions — Instead of examples, the prompt could just clearly tell the LLM what to do in words.

Markers — It helps if the prompt has a predictable structure, so we might separate parts using markers to make it clear.

Context — In addition to the above, sometimes we want to provide more general “context” or background information to the LLM in the prompt as well.

The key parts are usually the input data, examples or instructions, and optionally markers or context. But prompts are flexible — there are different ways to construct one.

Keep it simple: Start with a basic prompt, then slowly make small changes and check the results.

Be direct: If you want the LLM to do something specific like use a certain format, say it clearly up front. Explicitly stating your goal gets the message across best.

Be specific: Vagueness hurts prompts. Give details but don’t overload the prompt with too much information. There’s a limit to how long it can be.

Examples are strong: If it’s hard to describe what you want, concrete samples of correct outputs for different inputs can help show the LLM what to do.

The details vary based on the task and model, but in general it’s good to start basic, be straightforward about your goal, provide details without being vague or adding too much, and examples can reinforce the message if words alone don’t work. Focus on simplicity, clarity and specificity.

While LLMs have become really popular recently thanks to models like ChatGPT, the idea of prompting has been around for a long time. Originally, models like GPT were trained specifically for certain tasks. Then with GPT-2, researchers started using zero-shot learning to solve many different problems with just one base model. Finally, GPT-3 showed that very large models get very good at solving tasks with just a little training, called few-shot learning. In this section, we’ll go through zero-shot and few-shot learning to better understand how they work. We’ll also cover some more advanced prompting methods. Seeing the history from GPT to GPT-2 to GPT-3 helps explain how prompting has evolved over time as language models have grown larger and more capable.

Zero-Shot Learning

The concept behind zero-shot learning is quite straightforward. We provide the LLM with a description of the task to complete along with the relevant input data, then have it generate an output. Because LLMs are pretrained on such huge amounts of text, they are often able to solve tasks this way without any fine-tuning or examples. Essentially, they can tap into what they already know to accomplish a variety of jobs. As the examples below show, even something as relatively small as GPT-3.5 was capable of zero-shot learning for different problems simply by being given a prompt describing the task.

Zero-shot learning was tested a lot with models like GPT-2 and it works well sometimes. But what if it doesn’t solve our task? Often we can hugely boost how well an LLM does the job by giving it more direct and specific details. In particular, adding samples of the wanted outputs to the prompt lets the model learn patterns from examples shown right in the prompt. If zero-shot learning fails, providing examples for the LLM to mimic is a good next step. Seeing cases of correct responses helps the model replicate the right behavior, leading to better performance than descriptive text alone.

Few-Shot Learning

Instead of solely relying on a task description, we can improve our prompt by including clear examples of real inputs and their ideal outputs. This idea underpins few-shot learning, which aims to boost how well an LLM does a job by showing it concrete samples of accurate responses. When done right and paired with a powerful model, few-shot learning is extremely helpful, as evidenced by how effective language models such as GPT-3 became; as shown below, even just a few examples allowed them to solve complex problems. By demonstrating the correct behavior, few-shot learning regularly achieves better results than descriptions alone and was a major factor in recent advancements in large language model capabilities.

But using few-shot learning with LLMs effectively can be complicated. Which examples should we provide? Is there a proper way to structure the prompt? Do small changes significantly impact the model?

Most LLMs are sensitive to how the prompt is constructed, so prompt engineering is important but difficult. While newer models like GPT-4 seem less sensitive to minor variations, research has offered some tips for few-shot learning that are still useful to know:

The order of examples matters — changing the sequence can greatly change performance. Adding more examples doesn’t solve this.

The distribution of labels in examples should match real data. Surprisingly, the labels don’t need to be perfect.

LLMs tend to repeat the last example due to recency bias.

Examples in the prompt should be varied and in random order.

Carefully selecting diverse examples and avoiding biases like repeating the final one can improve few-shot learning results. Prompt construction is critical.

few-shot learning vs. fine-tuning. Before moving forward, I want to clear up a common misunderstanding. Few-shot learning is different than fine-tuning. With few-shot learning, examples are included right in the prompt for context, but the model itself is not changed at all. This is called “in-context learning.” Only the prompt is modified, not the model’s internal weights. Fine-tuning is when a model is explicitly retrained on a specific dataset, which does alter its parameters through backpropagation. Few-shot learning provides additional information through the prompt instead of modifying the model like fine-tuning does. The key distinction is that few-shot learning does not change the actual model.

Instruction Prompting

While few-shot learning is very powerful, it has a downside — examples use up a lot of the limited context window in an LLM. So we may want methods that are less token-heavy. Is it possible to just explain the right behavior verbally instead of showing samples? The short answer is yes! Giving written directions in the prompt, called instruction prompting, works well — and it’s especially effective with a certain type of LLM model.

A lot of recent LLM development has focused on getting better at following what users instruct them to do. Pre-trained models aren’t naturally good at this. But teaching models to follow instructions makes them much better at doing what people want. LLMs that are good at instruction following enable useful applications like conversational assistants that answer questions (like ChatGPT) and programming helpers (like Codex). Making models pay attention to instructions improves how aligned they are with human intent.

As discussed before, the first step to creating an LLM is pre-training the model to predict the next word in large datasets. This gives the model knowledge but doesn’t guarantee its answers will be engaging or solutions will be useful. Models also struggle following complex requests. To encourage that, we need to go beyond basic pre-training.

There are a couple ways to teach an LLM to obey instructions. For example, we can do “instruction tuning” which fine-tunes the model on examples of dialogs with instructions. Several notable models used this approach like LLaMA and its variants, all FLAN models, OPT-IML, and more. Alternatively, the three-step process of supervised training and then reinforcement learning from human feedback leads to amazing models like ChatGPT, GPT-4, Sparrow, and others. These methods help models learn to interact based on users’ guidance rather than just language patterns.

Here are some tips for writing effective instructions when prompting an instruction-following LLM:

Be very specific and detailed in the instruction.
Avoid saying what not to do — instead focus on clearly stating what you want the LLM to accomplish.
Using markers or indicators in the prompt structure helps distinguish the instruction from other content like examples or input data. This makes the instruction clearer.

The general idea is to precisely yet positively convey what task or outcome you need from the LLM. Being direct and labeling the instruction portion properly helps the model understand and carry out its directions successfully. With an LLM trained for instruction following, taking care with your guidance enables it to help in useful ways.

role prompting. With role prompting, you assign the LLM a specific “role” or persona by including a short description in the prompt. For example:

You are a famous mathematician
You are a doctor
You are an expert in music

The text snippet that introduces the role comes at the beginning of the prompt. Then surprisingly, newer LLMs seem able to take on and stay in character for that role throughout a conversation.

So instead of just responding based on its general training, the LLM takes the perspective of being a certain type of person. This provides context that could influence its answers. Role prompting allows tailoring the LLM’s mindset and approach through the use of prompt-assigned identities. It’s a technique related to instruction prompting that offers another way to shape model behavior.

Conclusion

Zero-shot learning allows performing tasks just by providing a description, but has limits. Few-shot learning improves it by inserting examples directly into the prompt.

Instruction-following LLMs enable a more compact method called instruction prompting, which explains the desired behavior textually rather than via consuming examples. But models need special training to follow instructions well.

Prompt engineering skills develop over time as new models arrive, but some approaches consistently help. Start simply and get more complex slowly. Be specific without being long-winded. And usually using techniques like few-shot learning, instruction prompting, or more advanced methods works best.

The ability to leverage LLMs depends a lot on crafting the right context for them through prompting. While optimal methods vary, principles like starting basic, avoiding vagueness, and leveraging examples or instructions can maximize success across different tasks and models. Prompt design is key to fulfilling the potential of these versatile tools.