Deploy Gemma 2B for free using UbiOps

CL
UbiOps-tech
Published in
6 min readMar 7, 2024

What can you get out of this guide?

In this guide, we explain how to:

  • Create a UbiOps trial account
  • Retrieve your Hugging Face token with access to Gemma
  • Create your Gemma deployment
  • Create a Gemma deployment version
  • Make an API call to Gemma 2B

To successfully complete this guide, make sure you have:

  • Python 3.10 or higher installed
  • A UbiOps trial account

You’ll also need the following files:

What is Gemma 2B?

Gemma is the latest model series released by Google in February 2024. It comes in two sizes, a 2B version, intended to run on mobile devices and laptops, and a 7B version, intended to run on desktop computers and small servers. Both versions come with a basic pre-trained version and an instruction-tuned version, meaning they were fine-tuned to be useful in conversational settings, i.e. to be a chatbot. In this tutorial, we will be focusing on the instruction-tuned 2B version. Gemma 2B is so small compared to other state-of-the-art models that it is often called a small language model (SLM).

What is an SLM? An SLM is a model designed to run on low-cost architecture. As mentioned, Google designed Gemma 2B to be run on smartphones and laptops. SLMs are very efficient, it takes orders of magnitude less processing power for Gemma 2B to generate an inference compared to GPT-4, which reportedly has around 1.7 trillion parameters — around 1000 times larger than Gemma 2B.

How performant is Gemma 2B? Well, according to the figures released by Google, it performs very well for its size. It performs very similar to LLaMa 2 7B, a model around three times its size, and not too far below Mistral 7B and LLaMa 13B. Read our article about comparing LLMs if you want to learn more about LLM performance benchmarks.

Average scores of models on a variety of benchmarks, source: Gemma technical report
Average scores of models on a variety of benchmarks, source: Hugging Face Open LLM leaderboard

Overall, Gemma 2B is an impressive and light-weight model. Furthermore, It is readily available on Hugging Face — given you accept Google’s license agreement.

How to deploy Gemma 2B on UbiOps

The first step is to create a free UbiOps account. Simply sign up with an email address and within a few clicks you will be good to go.

In UbiOps you work within an organization, which can contain one or more projects. Within these projects, you can create multiple deployments, which are basically your containerized code. You can also chain together deployments to create a pipeline.

Create a project

Head over to the UbiOps WebApp and click on “Create new project”. You can give your project a unique name, or let UbiOps generate one for you.

Retrieve your Gemma 2B token

To be able to download Gemma 2B from Hugging face, you will need a Hugging Face API token showing you have accepted Google’s license agreement.

Firstly, accept the following license agreement on HuggingFace by pressing the button “Acknowledge license” on the Gemma 2B Hugging Face page.

Follow the instructions and accept the license agreement if you agree to the terms.

Secondly, go to Settings->Access Tokens->New Token and generate a new token with the read permission. Copy the token to your clipboard.

Lastly, you will need to add this token to the `deployment.py` file which is inside the `gemma-2b-deployment.zip` file you downloaded at the start of the guide. Unzip it and open `deployment.py`.

Paste your token inside the quotes defining the “token” variable, save, and re-zip the folder.

Create your Gemma 2B deployment

Head back over to UbiOps and navigate to the “Deployments” tab on the left and click on “Create”. In the following menu, you can define the name of the deployment as well as its input(s) and output(s). The input and output fields of the deployment define what data the deployment expects when making a request (i.e. when running the model). For this guide you can use the following:

You can skip the “Deployment Templates” and finish creating your deployment by clicking “Next: Create a version”.

Create a deployment version

Add the `gemma-2b-deployment.zip` as your deployment package. Upload this deployment package to UbiOps, which contains the code that retrieves the model from Hugging Face, as well as defining any dependencies and configurations required to run the model. It also contains the `requirements.txt` file which creates your environment implicitly.

Then, select the hardware instance which the model will run on. For this deployment, you’ll need at least v4 CPU, which is available for free. When you are happy with your settings, click on “Create” and UbiOps will get straight to work building your deployment version, which builds Gemma 2B on our side and allows you to make requests to the model via a REST endpoint.

We also have extensive monitoring capabilities for each deployment version, including detailed logs, performance metrics and more. The initial process of building could take a little time, but it can be monitored by reading the logs. Read our page about monitoring to learn more!

How to run a Gemma 2B model on UbiOps

Navigate to your Gemma 2B deployment version and click on the “Create Request” button to create your first request.

Conclusion

Our very own Gemma 2B API, hosted and served on UbiOps. All in under 15 minutes, without needing a software engineer.

Naturally, there are further optimizations that can be made to the code to get your deployment running as fast as possible every time. We left these out of scope for this guide, but we invite you to iterate and improve your own deployment!

Having completed this guide, you may now be wondering how to fine-tune your LLM, implement RAG, or build a chatbot front end. For more guides and tutorials, head over to the UbiOps blog. Or, for guidance on Ubiops features, check out our documentation.

If you’d like us to write about something specific, just shoot us a message or start a conversation in our Slack community. The UbiOps team would love to help you bring your project to life!

Thanks for reading!

--

--