Exploring Gemma: Google open-source AI model

Comprehensive Guide

Vishnu Sivan
The Pythoneers
10 min readMar 7, 2024

--

“Image By Author”

In recent months, Google has made waves in the AI landscape with the introduction of its Gemini models. Now, Google is once again leading the charge in innovation with the unveiling of Gemma, a transformative leap into the world of open-source AI.

Unlike its predecessors, Gemma stands out as a lightweight, smaller model crafted to assist developers worldwide in responsibly building AI solutions, aligning closely with Google’s AI principles. This landmark initiative signifies a significant moment in the democratization of AI technology, offering developers and researchers unprecedented access to cutting-edge tools.

As an open-source model, Gemma not only democratizes access to state-of-the-art AI technologies but also encourages a global community of developers, researchers, and enthusiasts to contribute to its advancement. This collaborative approach is set to accelerate innovation in AI, dismantling barriers and nurturing a culture of shared knowledge and resources.

In this article, we are going to explore the Gemma model with Keras and try out some experiments on text generation tasks, including question-answering, summarization, and fine-tuning the model.

Getting Started

Table of contents

What is Gemma

Gemma, the latest addition to Google’s AI lineup, comprises lightweight, top-tier open models derived from the same technology powering the Gemini models. These text-to-text, decoder-only large language models, available in English, offer open weights, pre-trained variants, and instruction-tuned variants. Gemma models excel in various text generation tasks like question answering, summarization, and reasoning. Their compact size facilitates deployment in resource-constrained environments such as laptops, desktops, or personal cloud infrastructure, democratizing access to cutting-edge AI models and spurring innovation for all.

Key Features:

  • Model Sizes: Google introduces Gemma models in two sizes: Gemma 2B and Gemma 7B, each offering pre-trained and instruction-tuned variants.
  • Responsible AI Toolkit: Google presents a Responsible Generative AI Toolkit to assist developers in creating safer AI applications with Gemma.
  • Toolchains for Inference and Fine-Tuning: Developers can utilize toolchains for inference and supervised fine-tuning (SFT) across major frameworks like JAX, PyTorch, and TensorFlow via native Keras 3.0.
  • Easy Deployment: Pre-trained and instruction-tuned Gemma models are deployable on laptops, workstations, or Google Cloud. They can be effortlessly deployed on Vertex AI and Google Kubernetes Engine (GKE).
  • Performance: Gemma models achieve top-tier performance for their sizes compared to other open models. They outperform significantly larger models on key benchmarks while maintaining rigorous standards for safe and responsible outputs.

Gemma vs Gemini

Google has stated that Gemma, while distinct from Gemini, shares essential technical and infrastructure components with it. This common foundation enables both Gemma 2B and Gemma 7B to achieve “best-in-class performance” relative to other open models of similar sizes.

Gemma vs Llama 2

Google compared Gemma 7B with Meta’s Llama 2 7B across various domains such as reasoning, mathematics, and code generation. Gemma outperformed Llama 2 significantly in all benchmarks. For instance, in reasoning, Gemma scored 55.1 compared to Llama 2’s 32.6 in the BBH benchmark. In mathematics, Gemma scored 46.4 in the GSM8K benchmark, while Llama 2 scored 14.6. Gemma also excelled in complex problem-solving, scoring 24.3 in the MATH 4-shot benchmark, surpassing Llama 2’s score of 2.5. Furthermore, in Python code generation, Gemma scored 32.3, outpacing Llama 2’s score of 12.8.

Image credits Gemma: Google introduces new state-of-the-art open models

Experimenting on Gemma

Gemma is readily available on Colab and Kaggle notebooks and integrates seamlessly with popular tools such as Hugging Face, NVIDIA, NeMo, MaxText, and TensorRT-LLM. Additionally, Developers can leverage Google’s toolchains for inference and supervised fine-tuning (SFT) across leading frameworks like JAX, PyTorch, and TensorFlow via Keras 3.0.

Method 1: Using Gemma with KerasNLP

KerasNLP provides convenient access to the Gemma model, allowing researchers and practitioners to readily explore and utilize its capabilities for their needs.

Enabling Gemma-7b access

Gemma-7b is a gated model, requiring users to request access.

Follow the steps to enabling the model access.

  • Login to your Kaggle account or register a new account if you don’t already have one.
  • Open the Gemma model page using the link Gemma | Kaggle.
  • On the Gemma model page, click on the “Request access” link to request access to the model.
  • Provide your First name, Last name, and Email id on the next page.
  • Click on “Accept” on the following page to accept the license agreement.

Kaggle Access key generation

To access the model, you will need a Kaggle access token as well. You can generate one by going to Settings | Kaggle, then clicking on the “Create New token” button under API to create a new access token.

Creating your first script with Gemma using KerasNLP

To run the model, Gemma needs a system with 16GB of RAM. In this section, we will use Google Colab instead of a personal machine. You can try running the same code on your machine if it meets the required specifications.

  • Open the link Welcome To Colaboratory — Colaboratory and Click on Sign in to login to your colab account or create a new account if you don’t have an account.
  • Change the Runtime to T4 GPU by Runtime → Change runtime type → T4 GPU → Save.

Setting up environment variables

To use Gemma, you must provide your Kaggle access token. Select Secrets (🔑) in the left pane and add your KAGGLE_USERNAME and KAGGLE_KEY.

Create a new colab notebook by Clicking on + New notebook button. Set environment variables for KAGGLE_USERNAME and KAGGLE_KEY.

import os
from google.colab import userdata

os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')

Installing the dependencies

Install the required python libraries to access the gemma model using the following command. Click on the play icon to execute the cell.

!pip install -q -U keras-nlp
!pip install -q -U keras>=3

Importing packages

Import Keras and KerasNLP.

import keras
import keras_nlp

Selecting a backend

Keras will work for tensorflow, jax and torch. Choose jax as the backend for this section.

import os
os.environ["KERAS_BACKEND"] = "jax"

Creating model

In this tutorial, you will create a model using GemmaCausalLM, an end-to-end Gemma model for causal language modeling. Create the model using the from_preset method. from_preset instantiates the model from a preset architecture and weights.

gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset("gemma_2b_en")

Use summary to get more info about the model:

gemma_lm.summary()

Generating text

The gemma model has a generate method that generates text based on a prompt. The optional max_length argument specifies the maximum length of the generated sequence.

gemma_lm.generate("What is the meaning of life?", max_length=64)
gemma_lm.generate("How does the brain work?", max_length=64)

You can also provide batched prompts using a list as input:

gemma_lm.generate(
["What is the meaning of life?",
"How does the brain work?"],
max_length=64)

Method 2: Using Gemma from HuggingFace

Accessing and utilizing the Gemma model is conveniently facilitated through the Hugging Face platform. The model is readily available for exploration, enabling researchers and practitioners to harness its potential.

Enabling Gemma-7b access

Gemma-7b is a gated model, requiring users to request access.

Follow the steps to enabling the model access.

  • Upon accessing the link, acknowledge the license agreement. You will then be directed to a page where you can authorize Kaggle to share your HuggingFace details.
  • Once you have acknowledged the license, proceed to authorize Kaggle to share your Hugging Face details. This step is necessary for further access.
  • After authorizing Kaggle, you will be redirected to a page displaying the license agreement. Click the “Accept” button to agree to the terms and conditions.

Hugging Face Access token generation

To access the model, you will need a HuggingFace access token as well. You can generate one by going to Settings, then Access Tokens in the left sidebar, and clicking on the “New token” button to create a new access token.

Create your first script with Gemma using HuggingFace

To run the model, Gemma needs a system with 16GB of RAM. In this section, we will use Google Colab instead of a personal machine. You can try running the same code on your machine if it meets the required specifications.

  • Open the link Welcome To Colaboratory — Colaboratory and Click on Sign in to login to your colab account or create a new account if you don’t have an account.
  • Change the Runtime to T4 GPU by Runtime → Change runtime type → T4 GPU → Save.
  • To use Gemma, you must provide your Hugging Face access token. Select Secrets (🔑) in the left pane and add your HF_TOKEN key.
  • Create a new colab notebook by Clicking on + New notebook button.

Installing dependencies

Install the required python libraries to access the gemma model using the following command. Click on the play icon to execute the cell.

!pip install transformers torch accelerate

HuggingFace login

To utilize the Gemma model, you need to authenticate your Hugging Face account. Add the provided code to a new cell for Hugging Face login. Click on the play icon to execute the cell. Input your Hugging Face access token in the designated cell to complete the authentication process.

from huggingface_hub import login
login()

Selecting model

Access the gemma-2b-it model using the following command. You can also try with any one of the gemma models. Refer the link Gemma release — a google Collection (huggingface.co) to know more about other gemma models.

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")

Generating text

  • Test the model by executing the following code snippet.
input_text = "What is Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**input_ids, max_new_tokens=1024)
print(tokenizer.decode(outputs[0]))

You will get the output as follows,

Thanks for reading this article !!

--

--

Vishnu Sivan
The Pythoneers

Try not to become a man of SUCCESS but rather try to become a man of VALUE