Exploring Gemma: A Google Lightweight and Open-Source Large Language Model

From training to user applications

4 min readMar 21, 2024

Large Language Model (LLM) is a deep learning model that can perform a variety of natural language processing tasks.

Let’s explore it step by step!

How to train a LLM:

LLMs are trained on massive amounts of unlabeled text data, also known as a corpus. This unsupervised learning approach allows them to identify patterns and relationships within the language itself.

Training data: unlabeled text corpus (Ex: web document, mathematics, code, e-books, etc.)
Training process: a general guideline to train a LLM is as follows

Similar to how we learn languages throughout our lives, training LLM can be broken down into 3 major stages:

Foundational Learning (Pre-training): This initial stage resembles our early childhood education (kindergarten/preschool and primary education). During this stage, LLMs are exposed to vast amounts of text data (unlabeled), absorbing basic language structures, grammar rules, syntax and vocabulary.
Domain-Specific Tuning (Supervised Instruction Tuning): Similar to high school or secondary education, this stage focuses on specific domains or tasks. LLMs are now exposed to high-quality instruction and response pairs (labeled), enabling it to learn more specific tasks (e.g., question answering, translation, content generation, etc.).
Human Interaction and Refinement (Reinforcement Learning with Human Feedback — RLHF): This final stage resembles our higher education (college/university). In this stage, we have human experts provide feedback on the LLM’s outputs, guiding its development and refining its capabilities. This feedback helps LLMs learn from its mistakes and improve its performance over time.

LLM’s architecture:

LLM’s architecture is based on Transformer decoder:

Embedding layer: this layer acts as a bridge between words and numbers. It converts words from the input text into numerical representations (vectors) that the model can understand and process. These vectors capture semantic information about the words, including their meaning and relationships to other words in the vocabulary.
Multi-head Attention mechanism: this is the heart of any Transformer decoder. It allows the model to attend to different parts of the input sequence simultaneously, focusing on relevant information for the task at hand. Each “head” in the multi-head attention mechanism learns a different way to attend to the input.
Feedforward layer: this layer adds non-linearity to the model, allowing it to capture complex relationships within the data. It typically consists of one or two fully connected neural network layers with activation functions.

Overview of Google’s Large Language Models

Google is at the forefront of research and development in Large Language Models (LLMs). These are powerful AI systems trained on massive amounts of text data to understand and generate human language. Here’s an overview of Google’s LLMs, categorized based on availability:

Closed-Source Models:

Meena (2020): an end-to-end, neural conversational model with 2.6 billion parameters that learns to respond sensibly to a given conversational context.
LaMDA (2021): “Language Model for Dialogue Applications” — excels at engaging in any topics or conversations.
PaLM (2022): a single model with 540-billion parameters that could generalize across domains and tasks while being highly efficient.
Bard (2023): an experimental conversational AI service, powered by LaMDA. It focuses on generating informative answers and creating diverse creative text formats.
Gemini (2023): built from the ground up for multimodality — reasoning seamlessly across text, images, audio, video, and code.

Open-Source Model:

Gemma (2024): Google’s first open-source LLM. Gemma is built for responsible AI development from the same research and technology used to create Gemini models.

Academic Benchmark Evaluation

The standard academic benchmarks showcase Gemma’s performances compared to other open-source models of similar size. These benchmark evaluations are grouped by capability and averaged by its respective scores (Figure 1).

User applications:

Information retrieval: Finding relevant information from a large amount of data (like searching the web and finding the most pertinent websites).
Sentiment analysis: Understanding the emotional tone of a piece of text (positive, negative, neutral). This can be useful for businesses gauging customer feedback or analyzing social media trends.
Text generation: Creating different creative text formats, following instructions or prompts provided by the user. This can include poems, code, scripts, emails, letters, etc.
Code generation: Generating different types of computer code based on specific requirements or functionalities. This can involve translating natural language instructions into code, completing existing code snippets, or even suggesting potential code fixes.
Chatbots & Conversational AI: Powering chatbots designed to engage in conversations with users in a natural and informative way. These chatbots can answer questions, provide customer service support, or even act as virtual assistants.

Conclusion:

This exploration of Gemma highlights its potential as a game-changer in the world of Large Language Models (LLMs). Additionally, Gemma embraces open-source development, fostering collaboration and transparency in AI research.

Key Points:

We explored the training process of LLMs, highlighting the three stages: foundational learning, domain-specific tuning, and human interaction for refinement.
We explored the core architecture of LLMs, highlighting the Transformer decoder. Academic Benchmarks suggest Gemma’s competitiveness with other open models of similar size.
Gemma’s potential applications range from information retrieval and sentiment analysis to creative text generation and code assistance.