Vectorising David
This article is part of the original draft I have written for the book Prompt with my friend David Boyle. Touch lightly on transformer architecture and how tokeniser work.
A very brief intro in case the whole article is too long to read:
Think about words in Bytes, a bunch of 0 and 1. If you get the impression that AI is all about probabilities–you are not wrong. ChatGPT uses a bunch of 0 and 1, encoded from words within sentences, as input and a huge vocabulary dictionary. It didn’t generate a whole block of text for you; it was “thinking” the best next word while generating every word. In a nutshell, this is how GPT-like models output in the simplest form:
- This (First words and then what’s the next most possible word?)
- This book (High probability as it is closer to the prompt)
- This book talks (High probability a verb appears after a noun)
- This book talks about
- This book talks about artificial
- This book talks about artificial intelligence
- This book talks about artificial intelligence in
- This book talks about artificial intelligence in marketing
- This book talks about artificial intelligence in marketing.
Every time it generates a word, it looks for the next most possible word to pair with…