Building Q-A Chat Bot: Gemma(LLM) with Hugging Face

Prashant Malge
10 min readMar 15, 2024

--

Introduction

In today’s world, we encounter different large language models (LLMs), some of which are paid while others are open source. For instance, OpenAI charges a fee for creating tokens, whereas platforms like Hugging Face provide open-source models free of charge. If you’re new to LLMs, starting with Hugging Face is advisable, as it offers a wide range of models that may fulfill your requirements.

In this blog, we’ll delve into Google’s recent launch of an open-source LLM named Gemma. We’ll explore Gemma and then proceed to create a question-answering (QA) chat model using VS Code. Additionally, we’ll deploy a Streamlit application to make it easily accessible to users. We’ll integrate Langchain and import Hugging Face to access the Gemma model. Throughout the blog, we’ll provide step-by-step instructions for creating tokens, which will be detailed for clarity.

Key Takeaways

  • Gemma is the latest model from Google, resembling other large language models (LLMs), and it’s an open-source model. We’re exploring the variations in the Gemma model, ranging from 2B to 7B.
  • We’re comparing Gemma with other large language models (LLMs) such as LLAMA2.
  • We’re creating a Q&A chat model and summarizing the model using different variations of Gemma.
  • We’re discovering the applications of Gemma.

What is Gemma?

Gemma is a lightweight model with four variations based on its size parameters. It is developed by Google and is derived from the Gemini model, a previous project by Google developers. Gemma employs the same research and technology as Gemini but with reduced parameters. It serves as a text-to-text, decoder-only large language model, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma is particularly suitable for text-to-text generation, similar to how I credit the chatbot model (BLLOM’s) you’ll read about later. This model is also open-source on Hugging Face. Additionally, Gemma provides capabilities for summarization and reasoning in an easy way. Accessing this model is straightforward through Hugging Face, and it may also be available on Google Cloud and AWS Bedrock platforms.

Key points about Gemma:

  1. Lightweight Model: Gemma is lightweight as compared to the previous Gemini pro. so for fine-tuning in local env is easy.
  2. Derived from Gemini: Gemma is derived from the Gemini model, using the same research and technology but with reduced parameters.
  3. Text-to-Text Generation: Gemma is primarily used for text-to-text generation tasks, including question answering, summarization, and reasoning. in the practical section, we discovered all this functionality.
  4. Variations: Gemma comes in four different models based on its size parameters, allowing users to choose the model variant that best suits their needs. (2B to 7B)
  5. Open Source: Gemma is an open-source model, so developers and researchers to access and use it freely for their projects. like we are using here and building the chatbot.
  6. Pre-trained Variants: Gemma offers pre-trained variants, allowing users to use the transfer learning concepts. pre-existing knowledge for their specific applications.
  7. Accessible via Hugging Face: Gemma models can be easily accessed and utilized through the Hugging Face platform, simplifying the integration process for developers.

What is Hugging Face?

Are you new to natural language processing (NLP)? Then Hugging Face is the perfect place to start. This platform offers a plethora of pre-trained models that you can easily integrate and fine-tune. While there are numerous large language models (LLMs) available today, some require payment. However, Hugging Face provides these models for free. The research community surrounding NLP is highly active, and Hugging Face plays a pivotal role by offering datasets and computer vision models alongside NLP models. It’s more than just a company; Hugging Face is a vibrant community-driven initiative that democratizes access to state-of-the-art NLP technologies.

Key points about Hugging Face:

  1. Large Language Models (LLMs): Hugging Face offers a wide range of pre-trained LLMs, including GPT, BERT, and T5, among others. These models are trained on large-amounts of text data and can perform different NLP tasks such as text generation, sentiment analysis, and question answering.
  2. Open-Source Platform: Hugging Face is an open-source platform, making it accessible to developers, and researchers. The platform enables collaboration and contributions from the community to improve existing models and develop new ones. here lots of datasets and models are free to use.
  3. Model Hub: Hugging Face provides a Model Hub where users can discover, share, and download pre-trained models. The Model Hub contains thousands of models trained on different datasets and optimized for different NLP tasks. presently they added some computer vision models also which based on the transformer.
  4. Fine-Tuning and Deployment: Users can fine-tune pre-trained models on their own datasets using Hugging Face’s tools and libraries. Once fine-tuned, models can be deployed in production environments using Hugging Face’s deployment solutions.
  5. Transformers Library: Hugging Face’s Transformers library is a popular open-source library for natural language processing(NLP). It provides easy-to-use interfaces for working with pre-trained models and includes using for tokenization, model training, and evaluation.
  6. Community and Support: Hugging Face has an active community of developers, researchers, and NLP enthusiasts. The platform offers forums, chat rooms, and documentation to support users in their NLP projects.

Comparing Gemma With Other LLM

The model Gemma 7B is quite robust and its performance is comparable to the best models in the 7B weight category, including Mistral 7B. On the other hand, Gemma 2B is a model that has an interesting size, but it doesn’t rank as high as the best-performing models of similar size, such as Phi 2, on the leaderboard. We are eagerly awaiting feedback from the community about its real-world usage!

Please keep in mind that the LLM Leaderboard is primarily designed to evaluate the quality of pre-trained models rather than chat ones. We recommend using other benchmarks such as MT Bench, EQ Bench, and the lmsys Arena for testing chat models.

Practical Implementation

So in practical implementation we are creating responsive chatbot using huggigface. the gamma model is available on hugging face we can easily import but the model is in the early state you want to first accept the acknowledge.

Step 1: Creating venv environment

conda create -p ./venv python=3.10 -y

Step 2: Activating the venv

conda activate venv

Step 3: Downloading the requirements.txt

pip install -r requirements.txt

Inside the requirements.txt we have different package like,

transformers
python-dotenv
notebook
streamlit
langchain

Step 4: Downloading the Pytorch

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

For this installation first check your graphics and Cuda version and as your requirements you can install by checking the official page of torch -> Link

Step 5: Creating the Hugging Face Token

  • Go to the brwoser and search Hugging Fance and click the first link after opening the suppose you don’t have the accuaount the ceraste the account then login.
  • After login you see the ui of hugging face. clieck the profile on the top of right side.
  • In the left side setting option is available click the setting option.
  • After clicking the setting you see the Access Token in left side click the option.
  • After clicking the access token you see the right-side new option available click the new and create the new access token key. then copy the token and paste it anywhere so we can use future.

Step 6: Opening VS Code

  • After navigating the project folder. open the terminal inside the folder and write the following command.
code .

Step 7: Creating .env

  • Open the terminal and paste the following command or you can create a new file by clicking left-side mouse button.
touch .env

after creating .env paste the hugging face token

HUGGINGFACEHUB_API_TOKEN=<your token>

Step 8: Create the jupyter notebook

  • write the following command for creating the jupyter notebook. open the terminal inside the vs code.
touch qa_notebook.ipynb

after complititng the command just click the notebook. and set the pthon environment for current notebook.

Step 9: Import Libraries: Import the required libraries in your notebook.

import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
from langchain import HuggingFaceHub
import os
from dotenv import load_dotenv

Step 10: Import Utils for ummarize the text: required function and constant here writing.

# Load environment variables from .env file
load_dotenv()
# Get the Hugging Face API token from the environment
huggingfacehub_api_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")
summarizer = HuggingFaceHub(
repo_id="google/gemma-7b-it",
model_kwargs={"temperature":1, "max_length":180, "language": "en"},

)
def summarize(llm, text) -> str:
return llm(f"Summarize this: {text}!")

Step 11: Printing the result for summarize

  • Here first we print the summarize the text then we create the chatbot.
print(summarize(summarizer, txt))

Step 12: Importing the Function and text For QA BOT

# Load environment variables from .env file
load_dotenv()
# Get the Hugging Face API token from the environment
huggingfacehub_api_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")
# Initialize the chatbot with a conversational model
chatbot = HuggingFaceHub(
repo_id="google/gemma-7b-it",
model_kwargs={"temperature": 0.7, "max_length": 100, "Top-P": 0.9},
)
def chat(llm, text) -> str:
return llm(text)
user_input = "What is Google?"

Step 13: Printing the result of the Q-A Model

Step 14: Importing the Function and text For the Reasoning Task

  • So here we are writing the full code for the reasoning task
# Load environment variables from .env file
load_dotenv()
# Get the Hugging Face API token from the environment
huggingfacehub_api_token = os.getenv("HF_TOKEN")
# Initialize the chatbot with the Gemma model
chatbot = HuggingFaceHub(
repo_id="google/gemma-7b-it",
model_kwargs={"temperature": 0.7, "max_length": 100, "top_p": 0.9},
)
def chat(llm, text) -> str:
return llm(text)
# Example of reasoning task
user_input = "If it's raining outside, then the ground will be wet. It's raining outside.
What can we conclude?"
print("User:", user_input)
response = chat(chatbot, user_input)
print("Chatbot:", response)

Building A Streamlit Application For The Q-A Model

  • In this step, we are creating the streamlit application.
  • But here we are integrating with all sub-models of gemma like gemma-7b-it same gemma-7b, 2b etc.
  • So the user can easily change.
import streamlit as st
from langchain import HuggingFaceHub
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Get the Hugging Face API token from the environment
huggingfacehub_api_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")

# Dictionary mapping model variants to their IDs
model_variants = {
"Gemma 7B": "google/gemma-7b",
"Gemma 7B Instruction Tuned": "google/gemma-7b-it",
"Gemma 2B": "google/gemma-2b",
"Gemma 2B Instruction Tuned": "google/gemma-2b-it"
}

# Streamlit UI
st.title("Gemma Chatbot")

# Dropdown menu to select the Gemma model variant
selected_variant = st.selectbox("Select Gemma Model Variant:", list(model_variants.keys()))

# Initialize the chatbot with the selected Gemma model variant
chatbot = HuggingFaceHub(
repo_id=model_variants[selected_variant],
model_kwargs={"temperature": 0.7, "max_length": 65000, "top_p": 0.9},
huggingfacehub_api_token=huggingfacehub_api_token
)

# Define a function for chatting
def chat(llm, text):
return llm(text)

# Text input for user to type messages
user_input = st.text_input("You:", "")

# Button to send message
if st.button("Send"):
# Get response from chatbot
response = chat(chatbot, user_input)
# Display response
st.text_area("Gemma:", value=response, height=100)
  • To run this Streamlit application, save the code to a file (e.g., gemma_chatbot.py) and run the following command in your terminal:
streamlit run gemma_chatbot.py
  • After running the above command you see our application interface.
  • Click the red rectangle box here you can see the different models, choose one.
  • In next to the rectangle box you see the text msg box here you can write your query.
  • After writing a query then you will see the output I will share a sample ss.

Conclusion

In this blog, we explore the Gemma model, which offers easy access through Hugging Face. The Gemma model represents the latest offering from Google, following the previous launch of Gemini-pro. While Gemma is not currently open-source, it is available on Hugging Face with a free access token. We also discuss how to obtain the Hugging Face access token. It’s anticipated that Gemma will eventually become completely open-source. Within the Hugging Face platform, there is a dedicated section for each model where users can input queries. Behind the scenes, Hugging Face trains the queries using the Gemma model and provides the output. However, since the code is private, in this blog, we create our own question-answering bot model. It’s user-friendly and straightforward to create.

Key Takeaways

  • Gemma is the latest model from Google, provided through Hugging Face.
  • While it is not currently fully open source, you can access it through Hugging Face with a free access token.
  • We learned how to create a Q&A bot using Gemma and Hugging Face.
  • We also built a user-friendly UI for a Streamlit application, allowing users to choose different models.

Frequently Asked Questions

1. What is Gemma, and how does it relate to Google’s previous model, Gemini-pro?

  • Gemma is Google’s latest model and an open-source project available through Hugging Face. It succeeds Gemini-pro and offers improved accessibility.

2. How can users access Gemma through Hugging Face?

  • While Gemma is not fully open source, users can access it for free by obtaining a Hugging Face access token, facilitating model usage and experimentation.

3. What functionality does Hugging Face provide for Gemma and similar models?

  • Hugging Face offers a platform where users can deploy models like Gemma and interact with them using natural language queries, although the underlying code is currently private.

4. What additional feature was developed alongside the question-answering bot?

  • A user-friendly UI for a Streamlit application was built to allow users to interact with the bot and select different models for experimentation and usage.

--

--

Prashant Malge

Aspiring Data Scientist | Data Enthusiast | Problem Solver| ML|