LLMs [ Part 1] — Basics
Terminologies :
Prompt — Test that you pass to an LLM
Context window — space or memory available to the prompt, size differs from model to model
Completion — the output of the model. [ eg. The prompt is passed to the model, the model then predicts the next words, and if the prompt contained a question, the model generates an answer ]
Inference — the act of using the model to generate text.
The completion is comprised of the text contained in the original prompt, followed by the generated text.
LLM Usecases and Tasks :
Next word prediction is the base concept behind a number of different capabilities of LLMs.
Few of them are :
1. Chatbots
2. write an essay based on a prompt
3. to summarize conversations
4. translation tasks — traditional translation between two different languages, such as French and German, or English and Spanish.
5. translate natural language to machine code
6. Information Retrieval — [eg. NER]
How was text generated before transformers ?
Previous generations of language models made use of an architecture called recurrent neural networks or RNNs.
→ RNNs while powerful for their time, were limited by the amount of compute and memory needed to perform well at generative tasks.
→ Model fails in next word prediction
→ cant handle complexities in language like homonyms or syntactic ambiguity
Attention is All You Need
Transformer architecture was introduced in this paper by Google and University of Toronto.
Some features that this architecture model introduced as part of Generative AI were:
→ It can be scaled efficiently to use multi-core GPUs,
→ it can parallel process input data, making use of much larger training datasets, and
→ it’s able to learn to pay attention to the meaning of the words it’s processing.
In the next part [ part 2] of this series, we shall look into the details of the transformer architecture.