Intro to LLMs

Ilias Papachristos
Google for Developers Europe
5 min readMay 16, 2024

What a week! OpenAI and Google announced so many new things!

Let’s touch the Large Language Models (LLMs) at a high level.

LLMs are a subset of Deep Learning (DL). LLMs refer to large, general-purpose language models that can be pre-trained and then fine-tuned for specific purposes.

The “large” in LLMs refers to two key aspects:

  1. Large Training Dataset: LLMs are trained on massive datasets of text and code, often reaching the petabyte scale. This vast amount of data allows the model to learn complex language patterns and generalise effectively to new situations.
  2. Large Number of Parameters: During training, LLMs develop parameters that represent their internal knowledge and understanding of the data. These parameters act as the model’s “memory” and “learning” acquired from the training process. For example, BERT, a well-known LLM, has 110 million parameters, while PaLM2 boasts a staggering 340 billion parameters. The increased number of parameters allows LLMs to handle more complex tasks and generate more nuanced outputs.

LLMs are designed for general-purpose applicability. This means they can be adapted to address a wide range of common language processing tasks without requiring extensive modification. This versatility stems from their pre-training on massive datasets.

Pre-training and Fine-tuning:

  • Pre-training: LLMs undergo an initial training phase on massive datasets of text and code. This pre-training process equips the model with a fundamental understanding of language structure, semantics, and relationships between words.
  • Fine-tuning: Following pre-training, LLMs can be further fine-tuned for specific tasks. This involves focusing the model’s already acquired knowledge on a particular problem domain. Fine-tuning typically involves using a smaller dataset tailored to the desired task.

LLMs are the backbone of many Generative AI (Gen AI) applications. These powerful AI models are trained on massive datasets of text and code, allowing them to grasp the nuances of human language with remarkable depth. Imagine an LLM as a vast library that has not only read countless books and articles but has also understood the underlying patterns and relationships within the text.

This empowers LLMs to perform a wide range of tasks, including:

  • Text Classification: Categorizing text data into predefined classes (e.g., sentiment analysis, spam detection).
    👨‍💻 Courses I suggest 👩‍💻
    💡 Text Classification with TensorFlow: This TensorFlow tutorial specifically focuses on building a text classification model with TensorFlow.
    💡 Natural Language Processing with Deep Learning Specialisation: Offered by deeplearning.ai (may have paid options), this specialisation delves into deep learning techniques for NLP tasks, including text classification. While there might be paid tracks, the course offers valuable free modules to get you started.
  • Question Answering: Extracting relevant answers to user queries from a given context.
    👩‍💻 Courses I suggest 👨‍💻
    💡Natural Language Processing with Stanford’s CoreNLP: This Coursera course by Stanford University introduces core NLP concepts and explores question-answering techniques using Stanford’s CoreNLP library. There’s a financial aid option available for this course.
    💡 Building Chatbots with Rasa: This course focuses on building chatbots, which heavily rely on the question-answering capabilities of LLMs. While some parts might be geared towards Rasa’s platform, the underlying concepts are valuable.
  • Document Summarisation: Condensing lengthy documents into concise summaries that capture the key points.
    👨‍💻 Courses I suggest 👩‍💻
    💡Natural Language Processing with Python: This Coursera specialisation by the University of Washington covers various NLP tasks, including document summarisation techniques. It explores tools like NLTK and spaCy. While there’s a financial aid option, some parts might require paid access.
    💡 Text Summarisation with TensorFlow: This TensorFlow tutorial demonstrates how to build a text summarisation model with TensorFlow.
  • Text Generation: Creating new text formats, like poems, code snippets, scripts, or emails.
    👩‍💻 Courses I suggest 👨‍💻
    💡Text Generation with Transformers: Hugging Face, a leading NLP platform, offers this course introducing text generation with transformer-based models, a key architecture used in LLMs.
    💡 Generative AI for Everyone: This course by fast.ai explores Generative AI concepts, including text generation techniques. It might require some prior knowledge of machine learning, but it offers valuable insights into the world of Gen AI for text creation.

By leveraging their pre-trained knowledge and the power of fine-tuning, LLMs can effectively address a diverse range of NLP challenges.

Prompt tuning is the cornerstone of successful Gen AI applications. It is a crucial skill in the realm of Gen AI. It’s akin to giving instructions to a skilled artist — the clearer and more specific the instructions, the better the final artwork.

Here’s how it works:

The Prompt: This is the starting point for Gen AI models. It can be a textual instruction, a question, or even a combination of text and other data types (like images or code) depending on the model.

  • Tuning the Prompt: The art of prompt tuning lies in crafting the prompt to elicit the desired response from the Gen AI model. This may involve:
    - Clarity: Using clear and concise language.
    - Specificity: Providing detailed instructions about the desired outcome.
    - Examples: Include examples to illustrate the desired format or style.
    - Structure: Organizing the prompt in a logical and easy-to-understand manner.

By effectively tuning the prompt, developers can maximize the potential of Gen AI models and achieve more accurate, creative, and relevant outputs.

Follow me, let’s connect on LinkedIn, and hit the clap icon if you like my article.

--

--

Ilias Papachristos
Google for Developers Europe

Full-Time Family Man, Retired Military Helicopter Pilot, Kendo Instructor, Google Cloud Champion Innovator AI/ML, Lead GDG Cloud Thessaloniki, WTM Ambassador