LLM Bootcamp Notes— Part 1: Prompt Engineering

Anirudh Gokulaprasad
4 min readSep 16, 2023

--

The LLM Bootcamp series by FullStackDeepLearning offers great insights into the world of Generative AI by taking a very structured approach to the topic. The series goes from introducing LLMs all the way upto approaches and recommendation for production grade LLMs & Generative AI solutions.

I have consolidated my learning notes in a series of 4 articles, distilling to the 4 core focus areas for LLMs from the Bootcamp series — This part 1 article is on Prompt Engineering

What is prompt Engineering?

Prompt Engineering is the art of designing the text that goes into the LLM. It can also be thought as a way to program these LLMs to do what it is instructed by the user

Prompts are Magic Spells

On a very high level, LLMs are statistical models of the data they are trained on. They act as auto-regressive models that predict on itself - For example, predicting the next word in a sentence. When thinking of generic statistical models, a pattern matcher can be considered as one. However, using a statistical pattern matcher gives very bad intuitions and next word recommendations.

Probabilistic programs are comparatively better that the statistical models, in that they can put some thought before answering a question. The thought is assigned a probability and then gives the output as the one with highest probability. However, these type of programs are arcane now.

An LLM in reality can be considered as a probabilistic model. This probabilistic model of text that has access to the base reference documents or in general, a database from which the results of these LLMs are generated. Here, prompting is a way of applying a weight to these documents to condition the model to weight the most relevant documents to base the results.

Prompting can also be thought of as a subtractive technique — That is each prompt narrows down the list of possibilities for the result by subtracting or weighting the important documents with a higher weightage and less important documents with a lesser weight accordingly.

Instruction tuning is also a way of asking the model to perform a few tasks and answer the question. This involves the model being given a specific instruction on behaviour, response style etc. It can range anything from “being fair, unbiased, to being funny”

There are a few rules that can be followed to extract the best value from prompts — The genie’s rules are:

  • Use low-level patterns: Instead of using instructions that require further explanation, use low-level patterns to give instructions. Eg: Create a question paper for etc etc topic — -> Use phrases like what, why, where, in regards to the context.
  • Itemising Instructions: Turn descriptive attributes into bulleted lists. Also, If there are negation statements, turn it into assertion statements. Eg: Don’t be biased — -> Be unbiased

Limitations:

  • Simulating something that does not exist may give the best results. Eg: Simulate a super intelligent AI — this is because the model may not have a ground reference for the simulation task at hand
  • If the LLM is ask to simulate a human thinking for a few seconds or a Reddit mediator, then they are good at it. But, simulating human thinking flow for hours or simulating a python kernel to compile program, then they may not be good at it — in such cases using a purpose built tool for the task might be better than using an LLM.

PROMPTING TECHNIQUES

Things to watch out for:

  • Few shot learning might be bad idea in most cases — a well done zero shot prompt can match the effect of multiple examples
  • tokenisation can be tricky
  • Models struggle to move away from their base training — i.e if an opposite example is given, the model ignores the example and might fall back to their training
  • Models don’t see words,, they only see tokens. Therefore even gibberish text inputs can sometimes give results.

The prompting playbook:

  • Operate on structured text — this gives the model easier access on the data
  • Automate the process of asking follow-up texts by self-ask examples — Eg. Asking the model to be self-critical of the answer it has provided
  • Reasoning by few-shot prompting with Chain-of-thought — Eg.
  • Alternatively, reasoning by “just asking for it” — Eg. Giving a reasoning followed by the instruction “think about it step by step”
  • Zero-Shot Chain-of-thought — self criticism not only asks the model to be critical but also just asks the models to fix its answer
  • Use Ensembling technique — Eg. take output from 50 different models for the same question and do a majority voting on the quality of the answers

The cost factor

All of these techniques either come at the cost of latency or the cost of compute. A simple thumb rule is that — the more the requests to the model, the more the cost & latency. Therefore techniques like self-criticism and emsembling are going to rack up the costs. Be mindful of the choices to keep the cost under control

Footnotes

To read the part 2 article of this series — Click here

To read the part 3 article of this series — Click here

To read the part 4 article of this series — Click here

Note — The article is a distilled consolidation of my understanding of the topic. If you find any conceptual errors, please leave a feedback so that I can fix it. Cheers!

References:

--

--