Exploring Gemma: Google open-source AI model
Comprehensive Guide
In recent months, Google has made waves in the AI landscape with the introduction of its Gemini models. Now, Google is once again leading the charge in innovation with the unveiling of Gemma, a transformative leap into the world of open-source AI.
Unlike its predecessors, Gemma stands out as a lightweight, smaller model crafted to assist developers worldwide in responsibly building AI solutions, aligning closely with Google’s AI principles. This landmark initiative signifies a significant moment in the democratization of AI technology, offering developers and researchers unprecedented access to cutting-edge tools.
As an open-source model, Gemma not only democratizes access to state-of-the-art AI technologies but also encourages a global community of developers, researchers, and enthusiasts to contribute to its advancement. This collaborative approach is set to accelerate innovation in AI, dismantling barriers and nurturing a culture of shared knowledge and resources.
In this article, we are going to explore the Gemma model with Keras and try out some experiments on text generation tasks, including question-answering, summarization, and fine-tuning the model.
Getting Started
Table of contents
- What is Gemma
- Gemma vs Gemini
- Gemma vs Llama 2
- Experimenting on Gemma
- Method 1: Using Gemma with KerasNLP
- Enabling Gemma-7b access
- Kaggle Access key generation
- Creating your first script with Gemma using KerasNLP
- Setting up environment variables
- Installing the dependencies
- Importing packages
- Selecting backend
- Creating model
- Generating text
- Method 2: Using Gemma from HuggingFace
- Enabling Gemma-7b access
- Hugging Face Access token generation
- Create your first script with Gemma using HuggingFace
- Installing dependencies
- HuggingFace login
- Selecting model
- Generating text
What is Gemma
Gemma, the latest addition to Google’s AI lineup, comprises lightweight, top-tier open models derived from the same technology powering the Gemini models. These text-to-text, decoder-only large language models, available in English, offer open weights, pre-trained variants, and instruction-tuned variants. Gemma models excel in various text generation tasks like question answering, summarization, and reasoning. Their compact size facilitates deployment in resource-constrained environments such as laptops, desktops, or personal cloud infrastructure, democratizing access to cutting-edge AI models and spurring innovation for all.
Key Features:
- Model Sizes: Google introduces Gemma models in two sizes: Gemma 2B and Gemma 7B, each offering pre-trained and instruction-tuned variants.
- Responsible AI Toolkit: Google presents a Responsible Generative AI Toolkit to assist developers in creating safer AI applications with Gemma.
- Toolchains for Inference and Fine-Tuning: Developers can utilize toolchains for inference and supervised fine-tuning (SFT) across major frameworks like JAX, PyTorch, and TensorFlow via native Keras 3.0.
- Easy Deployment: Pre-trained and instruction-tuned Gemma models are deployable on laptops, workstations, or Google Cloud. They can be effortlessly deployed on Vertex AI and Google Kubernetes Engine (GKE).
- Performance: Gemma models achieve top-tier performance for their sizes compared to other open models. They outperform significantly larger models on key benchmarks while maintaining rigorous standards for safe and responsible outputs.
Gemma vs Gemini
Google has stated that Gemma, while distinct from Gemini, shares essential technical and infrastructure components with it. This common foundation enables both Gemma 2B and Gemma 7B to achieve “best-in-class performance” relative to other open models of similar sizes.
Gemma vs Llama 2
Google compared Gemma 7B with Meta’s Llama 2 7B across various domains such as reasoning, mathematics, and code generation. Gemma outperformed Llama 2 significantly in all benchmarks. For instance, in reasoning, Gemma scored 55.1 compared to Llama 2’s 32.6 in the BBH benchmark. In mathematics, Gemma scored 46.4 in the GSM8K benchmark, while Llama 2 scored 14.6. Gemma also excelled in complex problem-solving, scoring 24.3 in the MATH 4-shot benchmark, surpassing Llama 2’s score of 2.5. Furthermore, in Python code generation, Gemma scored 32.3, outpacing Llama 2’s score of 12.8.
Experimenting on Gemma
Gemma is readily available on Colab and Kaggle notebooks and integrates seamlessly with popular tools such as Hugging Face, NVIDIA, NeMo, MaxText, and TensorRT-LLM. Additionally, Developers can leverage Google’s toolchains for inference and supervised fine-tuning (SFT) across leading frameworks like JAX, PyTorch, and TensorFlow via Keras 3.0.
Method 1: Using Gemma with KerasNLP
KerasNLP provides convenient access to the Gemma model, allowing researchers and practitioners to readily explore and utilize its capabilities for their needs.
Enabling Gemma-7b access
Gemma-7b is a gated model, requiring users to request access.
Follow the steps to enabling the model access.
- Login to your Kaggle account or register a new account if you don’t already have one.
- Open the Gemma model page using the link Gemma | Kaggle.
- On the Gemma model page, click on the “Request access” link to request access to the model.
- Provide your First name, Last name, and Email id on the next page.
- Click on “Accept” on the following page to accept the license agreement.
Kaggle Access key generation
To access the model, you will need a Kaggle access token as well. You can generate one by going to Settings | Kaggle, then clicking on the “Create New token” button under API to create a new access token.
Creating your first script with Gemma using KerasNLP
To run the model, Gemma needs a system with 16GB of RAM. In this section, we will use Google Colab instead of a personal machine. You can try running the same code on your machine if it meets the required specifications.
- Open the link Welcome To Colaboratory — Colaboratory and Click on Sign in to login to your colab account or create a new account if you don’t have an account.
- Change the Runtime to T4 GPU by Runtime → Change runtime type → T4 GPU → Save.
Setting up environment variables
To use Gemma, you must provide your Kaggle access token. Select Secrets (🔑) in the left pane and add your KAGGLE_USERNAME and KAGGLE_KEY.
Create a new colab notebook by Clicking on + New notebook button. Set environment variables for KAGGLE_USERNAME
and KAGGLE_KEY
.
import os
from google.colab import userdata
os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')
Installing the dependencies
Install the required python libraries to access the gemma model using the following command. Click on the play icon to execute the cell.
!pip install -q -U keras-nlp
!pip install -q -U keras>=3
Importing packages
Import Keras and KerasNLP.
import keras
import keras_nlp
Selecting a backend
Keras will work for tensorflow, jax and torch. Choose jax as the backend for this section.
import os
os.environ["KERAS_BACKEND"] = "jax"
Creating model
In this tutorial, you will create a model using GemmaCausalLM
, an end-to-end Gemma model for causal language modeling. Create the model using the from_preset
method. from_preset
instantiates the model from a preset architecture and weights.
gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset("gemma_2b_en")
Use summary
to get more info about the model:
gemma_lm.summary()
Generating text
The gemma model has a generate
method that generates text based on a prompt. The optional max_length
argument specifies the maximum length of the generated sequence.
gemma_lm.generate("What is the meaning of life?", max_length=64)
gemma_lm.generate("How does the brain work?", max_length=64)
You can also provide batched prompts using a list as input:
gemma_lm.generate(
["What is the meaning of life?",
"How does the brain work?"],
max_length=64)
Method 2: Using Gemma from HuggingFace
Accessing and utilizing the Gemma model is conveniently facilitated through the Hugging Face platform. The model is readily available for exploration, enabling researchers and practitioners to harness its potential.
Enabling Gemma-7b access
Gemma-7b is a gated model, requiring users to request access.
Follow the steps to enabling the model access.
- Login to your Hugging Face account or register a new account if you don’t already have one.
- You can visit https://huggingface.co/google/gemma-7b to request access.
- Upon accessing the link, acknowledge the license agreement. You will then be directed to a page where you can authorize Kaggle to share your HuggingFace details.
- Once you have acknowledged the license, proceed to authorize Kaggle to share your Hugging Face details. This step is necessary for further access.
- After authorizing Kaggle, you will be redirected to a page displaying the license agreement. Click the “Accept” button to agree to the terms and conditions.
- With the license agreement accepted, you now have access to the Gemma-7b model.
- To confirm your access, navigate to the following link: https://huggingface.co/google/gemma-7b/resolve/main/config.json. If you have successfully gained access to the Gemma-7b model, you will receive relevant information about it.
Hugging Face Access token generation
To access the model, you will need a HuggingFace access token as well. You can generate one by going to Settings, then Access Tokens in the left sidebar, and clicking on the “New token” button to create a new access token.
Create your first script with Gemma using HuggingFace
To run the model, Gemma needs a system with 16GB of RAM. In this section, we will use Google Colab instead of a personal machine. You can try running the same code on your machine if it meets the required specifications.
- Open the link Welcome To Colaboratory — Colaboratory and Click on Sign in to login to your colab account or create a new account if you don’t have an account.
- Change the Runtime to T4 GPU by Runtime → Change runtime type → T4 GPU → Save.
- To use Gemma, you must provide your Hugging Face access token. Select Secrets (🔑) in the left pane and add your
HF_TOKEN
key. - Create a new colab notebook by Clicking on + New notebook button.
Installing dependencies
Install the required python libraries to access the gemma model using the following command. Click on the play icon to execute the cell.
!pip install transformers torch accelerate
HuggingFace login
To utilize the Gemma model, you need to authenticate your Hugging Face account. Add the provided code to a new cell for Hugging Face login. Click on the play icon to execute the cell. Input your Hugging Face access token in the designated cell to complete the authentication process.
from huggingface_hub import login
login()
Selecting model
Access the gemma-2b-it
model using the following command. You can also try with any one of the gemma models. Refer the link Gemma release — a google Collection (huggingface.co) to know more about other gemma models.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
Generating text
- Test the model by executing the following code snippet.
input_text = "What is Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids, max_new_tokens=1024)
print(tokenizer.decode(outputs[0]))
You will get the output as follows,
Thanks for reading this article !!
If you enjoyed this article, please click on the clap button 👏 and share to help others find it!
The full source code for this tutorial can be found here,