LLMs [ Part 1] — Basics

Kamna Sinha
Data At The Core !
Published in
2 min readDec 24, 2023
img : wisecube.ai

Terminologies :

Prompt — Test that you pass to an LLM

Context window — space or memory available to the prompt, size differs from model to model

Completion — the output of the model. [ eg. The prompt is passed to the model, the model then predicts the next words, and if the prompt contained a question, the model generates an answer ]

Inference — the act of using the model to generate text.

The completion is comprised of the text contained in the original prompt, followed by the generated text.

LLM Usecases and Tasks :

Next word prediction is the base concept behind a number of different capabilities of LLMs.

Few of them are :

1. Chatbots
2. write an essay based on a prompt
3. to summarize conversations
4. translation tasks — traditional translation between two different languages, such as French and German, or English and Spanish.
5. translate natural language to machine code
6. Information Retrieval — [eg. NER]

How was text generated before transformers ?

Previous generations of language models made use of an architecture called recurrent neural networks or RNNs.

→ RNNs while powerful for their time, were limited by the amount of compute and memory needed to perform well at generative tasks.
→ Model fails in next word prediction
→ cant handle complexities in language like homonyms or syntactic ambiguity

Attention is All You Need

Transformer architecture was introduced in this paper by Google and University of Toronto.

https://magazine.sebastianraschka.com/p/understanding-large-language-models

Some features that this architecture model introduced as part of Generative AI were:
→ It can be scaled efficiently to use multi-core GPUs,
→ it can parallel process input data, making use of much larger training datasets, and
→ it’s able to learn to pay attention to the meaning of the words it’s processing.

In the next part [ part 2] of this series, we shall look into the details of the transformer architecture.

--

--