The No BS Guide to LLMs: A Primer for ML Enthusiasts

Shivansh Kaushik
Primastat
Published in
6 min readJul 18, 2023

The recent AI revolution has brought about groundbreaking advancements in Natural Language Processing (NLP) and Machine Learning (ML). Language models, specifically Large Language Models (LLMs), have played a pivotal role in this revolution. If you’re working in the data science/ML industry or considering a transition due to the AI revolution, this no-nonsense guide will equip you with the knowledge and resources to understand LLMs and get started in the field.

Understanding Large Language models

At its core, a language model is a statistical model that learns patterns and structures within a given language. It aims to understand the relationships between words, phrases, and sentences in order to generate coherent and contextually appropriate text. Language models can be trained on vast amounts of text data, allowing them to acquire a deep understanding of the language’s nuances, grammar, and semantics.

Language models serve as a foundational pillar in NLP by enabling a wide range of applications, including:

  • Text Generation
  • Machine Translation
  • Speech Recognition and Synthesis
  • Sentiment Analysis
  • Question-Answering Systems

Transformers

Transformers form the bedrock of modern language models, including Large Language Models (LLMs). They have revolutionized NLP by addressing the limitations of traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs) in capturing long-range dependencies and handling sequential data efficiently.

The origins of transformers can be traced back to the seminal paper titled “Attention Is All You Need,” published by Vaswani et al. in 2017. The transformer architecture introduced in this paper has since become a cornerstone in the field of NLP.

Credit: https://jalammar.github.io/illustrated-transformer/

The Illustrated Transformer” by Jay Alammar is a brilliant article for anyone who wants to get an in-depth understanding of transformers and their architecture.

Key Terms used in the LLM space

In the realm of LLMs, there are several key terms that are commonly used. Understanding these terms is essential for effectively working with LLMs and grasping their functionality. Here are a few key terms:

  1. Prompts: Prompts are initial text inputs provided to the LLM to guide its generation or completion task. They serve as a starting point or context for the model to generate relevant responses or continue the text based on the given input.
  2. Fine-tuning: Fine-tuning refers to the process of taking a pre-trained LLM and training it further on a specific task or dataset. By fine-tuning, the model can adapt its knowledge and parameters to better perform on the target task, improving its output quality.
  3. Tokenization: Tokenization involves breaking down text into smaller units called tokens. Tokens can be words, subwords, or characters. Tokenization is a crucial step in preparing text data for LLMs, as it allows the model to process and understand the input at a granular level.
  4. Attention Mechanism: The attention mechanism is a fundamental component of transformers. It enables the model to assign different weights or attention scores to different parts of the input sequence, highlighting the important information for generating relevant and contextually informed responses.
  5. Beam Search: Beam search is a decoding technique used to generate multiple candidate outputs from the LLM. It explores different paths by considering a set of top-k probable tokens at each step, allowing the model to produce diverse and high-quality output sequences.
  6. Perplexity: Perplexity is a metric used to measure the performance of language models. It quantifies how well a language model predicts a given sequence of tokens. A lower perplexity value indicates that the model has a better understanding of the data it has been trained on.

Hands-on Implementation

Now, we will utilize an LLM for amachine translation task. We are going to use Google’s T5 Model which is an open-source encoder-decoder based model via Huggingface’s transformers library and run it on Google Colab.

Setting Up Environment with Google Colab:

Before we dive into utilizing Hugging Face’s pipeline to leverage the T5 model, let’s set up our environment using Google Colab. Follow these steps:

!pip install transformers
  • Import the necessary libraries:
from transformers import pipeline

Using Hugging Face’s Pipeline for T5 Model:

Hugging Face’s pipeline is a powerful tool that simplifies the process of utilizing various language models, including T5. It provides a high-level interface to perform a wide range of tasks, such as text generation, translation, summarization, and question-answering. Let's explore how to use the T5 model through the Hugging Face pipeline.

  • Instantiate the pipeline for T5 text generation:
generator = pipeline("text2text-generation", model="t5-base", tokenizer="t5-base")

The text2text-generation task specifies that we want to generate text using the T5 model.

We specify the model and tokenizer as t5-base to use the base T5 model and its corresponding tokenizer.

  • Generate text using the T5 model:
input_text = "Translate this English text to French."
output_text = generator(input_text, max_length=50, num_return_sequences=1)[0]["generated_text"]

Set the input_text variable to the desired input text in English.

Utilize the generator pipeline to generate text based on the T5 model.

The max_length parameter specifies the maximum length of the generated text.

The num_return_sequences parameter determines the number of output sequences to generate.

Retrieve the generated text using indexing and the key "generated_text".

Recommended Reading Material and Resources:

There are a huge number of resources available for free to help you get started and excel in this space. Here are some of my recommendations:

Courses

Research Papers

  • Attention is All You Need: The 2017 paper “Attention Is All You Need” introduced the Transformer architecture, revolutionizing Natural Language Processing (NLP). By replacing recurrent connections with self-attention mechanisms, it demonstrated the ability to capture long-range dependencies. This paper laid the groundwork for modern language models and their exceptional performance in NLP tasks.
  • Training language models to follow instructions with human feedback:
    The groundbreaking paper published by OpenAI in March 2022, explores the architecture of ChatGPT. It challenges the notion that larger models necessarily improve performance and introduces InstructGPT. This model aligns with human feedback to fine-tune intent, utilizing the Reinforcement Learning with Human Feedback (RLHF) paradigm proposed by Christiano in 2017.
  • BLOOM: During the Big Science Workshop in December 2022, a transformer model called BLOOM was unveiled. BLOOM is a decoder-only model designed to democratize Large Language Models (LLMs). To foster accessibility, the company made the decision to release it as an open-source project. The model was trained on the ROOTS corpus, a comprehensive dataset encompassing 46 natural languages and 13 programming languages.
  • BERT: BERT was introduced by Google AI a year before OpenAI introduced GPT-3. BERT employed a masked language modeling task and next sentence prediction to pretrain a bidirectional transformer architecture. It achieved state-of-the-art performance on various NLP benchmarks, revolutionizing the field of natural language processing.
  • Sparks of AGI: The paper presents an investigation into an early version of OpenAI’s GPT-4, highlighting its advanced capabilities in various domains without specific prompts. GPT-4 demonstrates near-human performance in tasks spanning mathematics, coding, vision, medicine, law, and psychology. The study discusses implications, limitations, and the path towards more comprehensive artificial general intelligence systems.

TL;DR

The article provides an overview of understanding Large Language Models (LLMs), the significance of transformers, key terms in the LLM space, hands-on implementation using Hugging Face’s pipeline and Google Colab, and recommended reading materials and research papers. It also discusses the impact of revolutionary LLMs like BERT and GPT-4, highlighting their advancements and implications for natural language processing and artificial general intelligence.

References

  1. The Illustrated Transformer
  2. HuggingFace Transformers Documentation

--

--

Shivansh Kaushik
Primastat

ML Engineer and innovator, on a mission to create a positive impact in the world using the powers of AI.