LLM Bootcamp Notes— Part 1: Prompt Engineering
The LLM Bootcamp series by FullStackDeepLearning offers great insights into the world of Generative AI by taking a very structured approach to the topic. The series goes from introducing LLMs all the way upto approaches and recommendation for production grade LLMs & Generative AI solutions.
I have consolidated my learning notes in a series of 4 articles, distilling to the 4 core focus areas for LLMs from the Bootcamp series — This part 1 article is on Prompt Engineering
What is prompt Engineering?
Prompt Engineering is the art of designing the text that goes into the LLM. It can also be thought as a way to program these LLMs to do what it is instructed by the user
Prompts are Magic Spells
On a very high level, LLMs are statistical models of the data they are trained on. They act as auto-regressive models that predict on itself - For example, predicting the next word in a sentence. When thinking of generic statistical models, a pattern matcher can be considered as one. However, using a statistical pattern matcher gives very bad intuitions and next word recommendations.
Probabilistic programs are comparatively better that the statistical models, in that they can put some thought before answering a question. The thought is assigned a probability and then gives the output as the one with highest probability. However, these type of programs are arcane now.
An LLM in reality can be considered as a probabilistic model. This probabilistic model of text that has access to the base reference documents or in general, a database from which the results of these LLMs are generated. Here, prompting is a way of applying a weight to these documents to condition the model to weight the most relevant documents to base the results.
Prompting can also be thought of as a subtractive technique — That is each prompt narrows down the list of possibilities for the result by subtracting or weighting the important documents with a higher weightage and less important documents with a lesser weight accordingly.
Instruction tuning is also a way of asking the model to perform a few tasks and answer the question. This involves the model being given a specific instruction on behaviour, response style etc. It can range anything from “being fair, unbiased, to being funny”
There are a few rules that can be followed to extract the best value from prompts — The genie’s rules are:
- Use low-level patterns: Instead of using instructions that require further explanation, use low-level patterns to give instructions. Eg: Create a question paper for etc etc topic — -> Use phrases like what, why, where, in regards to the context.
- Itemising Instructions: Turn descriptive attributes into bulleted lists. Also, If there are negation statements, turn it into assertion statements. Eg: Don’t be biased — -> Be unbiased
Limitations:
- Simulating something that does not exist may give the best results. Eg: Simulate a super intelligent AI — this is because the model may not have a ground reference for the simulation task at hand
- If the LLM is ask to simulate a human thinking for a few seconds or a Reddit mediator, then they are good at it. But, simulating human thinking flow for hours or simulating a python kernel to compile program, then they may not be good at it — in such cases using a purpose built tool for the task might be better than using an LLM.
PROMPTING TECHNIQUES
Things to watch out for:
- Few shot learning might be bad idea in most cases — a well done zero shot prompt can match the effect of multiple examples
- tokenisation can be tricky
- Models struggle to move away from their base training — i.e if an opposite example is given, the model ignores the example and might fall back to their training
- Models don’t see words,, they only see tokens. Therefore even gibberish text inputs can sometimes give results.
The prompting playbook:
- Operate on structured text — this gives the model easier access on the data
- Automate the process of asking follow-up texts by self-ask examples — Eg. Asking the model to be self-critical of the answer it has provided
- Reasoning by few-shot prompting with Chain-of-thought — Eg.
- Alternatively, reasoning by “just asking for it” — Eg. Giving a reasoning followed by the instruction “think about it step by step”
- Zero-Shot Chain-of-thought — self criticism not only asks the model to be critical but also just asks the models to fix its answer
- Use Ensembling technique — Eg. take output from 50 different models for the same question and do a majority voting on the quality of the answers
The cost factor
All of these techniques either come at the cost of latency or the cost of compute. A simple thumb rule is that — the more the requests to the model, the more the cost & latency. Therefore techniques like self-criticism and emsembling are going to rack up the costs. Be mindful of the choices to keep the cost under control
Footnotes
To read the part 2 article of this series — Click here
To read the part 3 article of this series — Click here
To read the part 4 article of this series — Click here
Note — The article is a distilled consolidation of my understanding of the topic. If you find any conceptual errors, please leave a feedback so that I can fix it. Cheers!
References: