Unravelling Gemini: Multimodal LLMs on Vertex AI

Broadening the Scope of User Interaction and Communication: Exploring the Universe of Multimodal Communication with Gemini

Published in

Google Cloud - Community

9 min readFeb 8, 2024

Have you ever imagined a world where applications intuitively understand and adapt to user behavior? Welcome to Gemini — the next frontier in application development, where innovation meets intuition.

In this blog, we will unveil Gemini’s magic and significance. We will dive into the powerful world of VertexAI, its models, and the seamless integration of Gemini with it. We will also explore Gemini-chat and Gemini’s Python capabilities.

Multimodal Magic

Many industries, for instance, Healthcare deal with multimodal data, i.e., a mixture of text, image, audio, and video. To develop or tune a model that can deal with such multimodality is every industry’s dream.

Large Language Models(LLMs) are pre-trained on huge amounts of data that take texts as input to tackle diverse sets of tasks, via prompts. What if we enable the model to take different modalities of input?

To learn more about LLMs basics and more switch to my blog: Large Language Models(LLMs) in Google Cloud with VertexAI

Multimodal LLMs (MLLMs) help to build a model that can learn from multiple kinds of datasets that extend the capability of LLM. — You can give a photo of the Pizza and ask for the recipe!

Google Cloud’s Vertex AI offers access to pre-trained LLMs like LaMDA and PaLM, even MLLMs — Gemini which can be fine-tuned for specific tasks, reducing development time and effort. You can choose from various sample prompts to start with or create your own:

The Enchantment of Gemini

Imagine a model that seamlessly shifts between writing witty poems, translating languages on the fly, and even generating code snippets based on your instructions. Gemini, a powerful MLLM residing on the versatile platform of Vertex AI. It is trained on different data types which enables it to understand and reason across them — That’s how it is different from all other models.

Gemini is the first model that outperformed human experts. It is pre-trained on text, video, and code.

It is a family of generative AI models, built to be multimodal which means it can take up any input and generates any output. Based on the complexity of the work you can choose from types of Gemini Model: Ultra, Pro, and Nano which we can access via Gemini API in Google AI Studio or Vertex AI.

Consider below example where we are creating a ‘Gemini Sample Prompt’ and uploading an image:

We upload below Image and input prompt: “What is this object and what is it used for?”, as you can see we got our desired response:

Now we tweak the prompt, and we ask to extract the data from pic and get output in Json format as it is a useful format for app development:

Voila! Well, that’s Gemini.

Vertex AI SDK for Python is the best option if you want to automate your workflow programmatically.

Gemini is one of the leading foundation models for coding in the world, so let’s explore its capability.

Vertex AI Gemini API

VertexAI offers the most robust set of tuning capabilities to customize models. For example, for the travel industry, you can build agents that can access data securely and tune a model with your brand voice. With VertexAI you can tune Gemini without ML expertise.

The Vertex AI Gemini API offers a single interface that allows users to interact with Gemini models in a unified manner.

There are three ways to interact with Gemini API: Vertex AI Studio, cURL commands in cloudshell and Vertex AI SDK for Python. We will be focusing on using Vertex AI SDK for Python.

To use Vertex AI SDK for Python we need first install google-cloud-aiplatform package, and import the google.cloud.aiplatform namespace.

It supports two models — Gemini Pro and Gemini Pro Vision:

Gemini Pro for language tasks that support text, code chat, and code generation. It can output text and code.

Below is an example that generates text from text prompts:

import vertexai
from vertexai.preview.generative_models import GenerativeModel

model = GenerativeModel("gemini-pro")

prompt = """Create a numbered list of 10 items. Each item should be a trending DevOps tools.

Each trend should be less than 5 words."""  # try your own prompt

responses = model.generate_content(prompt, stream=True)

for response in responses:
    print(response.text, end="")

1. GitLab
2. Docker
3. Kubernetes
4. Jenkins
5. Terraform
6. Prometheus
7. Grafana
8. Ansible
9. Selenium
10. SonarQube

We can alter the parameter values of model to control the response.

The Gemini Pro model is the perfect solution for text tasks that demand seamless back-and-forth interactions. With its advanced natural multi-turn conversation capabilities, you’ll be amazed at how effortless it is to communicate, consider below example:

import vertexai
from vertexai.preview.generative_models import GenerativeModel

model = GenerativeModel("gemini-pro")
chat = model.start_chat()

prompt = """My name is Ned. You are my personal assistant. My favourite books are Immortals of Melluha and Ram Scion of Ikshvaku.

Suggest more books that I might like.""" 

responses = chat.send_message(prompt, stream=True)

for response in responses:
    print(response.text, end="")

1. **The Shiva Trilogy** by Amish Tripathi
- Continuation of the Immortals of Meluha series, comprising of The Secret of the Nagas and The Oath of the Vayuputras
- Explores the story of Shiva, the greatest warrior of the Meluha kingdom, and his fight against the evil forces that threaten to destroy it.


2. **The Ramayana Series** by Amish Tripathi
- A retelling of the epic Hindu mythology Ramayana
- Offers a fresh perspective on the characters and events of the original story


3. **Sita: Warrior of Mithila** by Amish Tripathi
-  A prequel to The Ramayana Series
- Explores the story of Sita, the warrior princess of Mithila, and her journey to becoming the wife of Lord Rama

Try adding more prompts to see how the model responds based on the previous prompt:

import vertexai
from vertexai.preview.generative_models import (
    GenerativeModel,
    ChatSession,
)

model = GenerativeModel("gemini-pro")
chat = model.start_chat()


def get_chat_response(chat: ChatSession, prompt: str) -> str:
    response = chat.send_message(prompt)
    return response.text

prompt = """My name is Ned. You are my personal assistant. My favourite books are Immortals of Melluha and Ram Scion of Ikshvaku.

Suggest more books that I might like.""" 

print(get_chat_response(chat, prompt))

prompt = """Are my favorite books based on mythological fiction?"""
print(get_chat_response(chat, prompt))

Below is the response:

Yes, your favorite books, **Immortals of Meluha** and **Ram Scion of Ikshvaku**, are both based on mythology.

**Immortals of Meluha** is the first book in the Shiva Trilogy by Amish Tripathi, which is a fictional retelling of the Hindu epic the Shiva Purana. The story follows the adventures of Shiva, a Tibetan immigrant from Mount Kailash, as he travels to the land of Meluha and becomes its savior. The novel incorporates many elements of Hindu mythology, including gods, demons, and mythical creatures.

**Ram Scion of Ikshvaku** is the first book in the Ram Chandra Series by Amish Tripathi, which is a fictional retelling of the Hindu epic the Ramayana. The story follows the life of Ram, the seventh avatar of Vishnu, from his birth to his coronation as king of Ayodhya. The novel stays true to the original mythology while also providing a fresh and unique perspective on the story.

Both of these books are examples of mythological fiction, which is a genre of literature that retells or reinterprets myths and legends from different cultures. Mythological fiction often explores themes such as good versus evil, the nature of heroism, and the human condition.

Since you enjoy mythological fiction, you may also be interested in other books in the genre, such as:

* **The Palace of Illusions** by Chitra Banerjee Divakaruni (a retelling of the Mahabharata from the perspective of Draupadi)
* **The Song of Achilles** by Madeline Miller (a retelling of the Greek myth of Achilles and Patroclus)
* **Circe** by Madeline Miller (a retelling of the Greek myth of Circe, the witch who turned Odysseus's men into pigs)
* **American Gods** by Neil Gaiman (a modern fantasy novel that explores the lives of various gods and mythological creatures who have immigrated to America)
* **The Sandman** by Neil Gaiman (a graphic novel series that follows the adventures of Dream, the personification of dreams)

These are just a few examples of the many great mythological fiction books that are available. I encourage you to explore the genre and find new stories and characters that you love.

Yes, I have a personal assistant with me :)

It also supports code chat that generates code from text:

import vertexai
from vertexai.preview.generative_models import GenerativeModel

model = GenerativeModel("gemini-pro")
code_chat = model.start_chat()

response = code_chat.send_message( "I want to monitor my disc space and send mail alerts when threshold reached. Help me with a shell script." )
print(response.text)

Below is the code you get:

#!/bin/bash

# Set the threshold for disk space (in percentage)
threshold=90

# Get current disk usage percentage
disk_usage=$(df -h --output=pcent / | tail -n 1 | tr -d '[:space:]%' )

# Check if disk usage exceeds the threshold
if [ "$disk_usage" -ge "$threshold" ]; then
    # Replace 'your_email@example.com' with the actual email address to receive alerts
    echo "Disk space usage is above the threshold. Current usage: $disk_usage%" | mail -s "Disk Space Alert" your_email@example.com
fi

Now let’s explore another model:

Gemini Pro Vision for multimodal prompts including text, image, and video. In the below code, we are giving an image as input and getting a text response:

import vertexai
from vertexai.preview.generative_models import GenerativeModel, Part


multimodal_model = GenerativeModel("gemini-pro-vision")
    # Query the model
response = multimodal_model.generate_content(
    [
        # Add an example image
        Part.from_uri(
            "gs://generativeai-downloads/images/scones.jpg", mime_type="image/jpeg"
        ),
        # Add an example query
        "what is shown in this image?",
        ]
    )
print(response)

The supported MIME type for image include image/png and image/jpeg. Below is the response, do inspect the categories of safety_ratings

candidates {
  content {
    role: "model"
    parts {
      text: " The image shows a table with a white surface. On the table are two cups of coffee, a bowl of blueberries, a silver spoon with the words \"Let\'s Jam\" engraved in the handle, and five scones with blueberries on top. There are also some pink flowers on the table. The table is covered in a white paper with purple and blue stains."
    }
  }
  finish_reason: STOP
  safety_ratings {
    category: HARM_CATEGORY_HARASSMENT
    probability: NEGLIGIBLE
  }
  safety_ratings {
    category: HARM_CATEGORY_HATE_SPEECH
    probability: NEGLIGIBLE
  }
  safety_ratings {
    category: HARM_CATEGORY_SEXUALLY_EXPLICIT
    probability: NEGLIGIBLE
  }
  safety_ratings {
    category: HARM_CATEGORY_DANGEROUS_CONTENT
    probability: NEGLIGIBLE
  }
}
usage_metadata {
  prompt_token_count: 265
  candidates_token_count: 73
  total_token_count: 338
}

Safety ratings allow developers to thoroughly test their models before deploying them, ensuring safe and responsible deployment. We can set a safety threshold to filter responses from the VertexAI Gemini API.

Conclusion

As we’ve explored the enchanting realm of Gemini, its multimodal magic, and the empowering platform of Vertex AI, one truth becomes undeniable: the possibilities are truly cosmic. From weaving words into poems to crafting code from mere prompts, Gemini transcends the limitations of language, igniting a new era of creative expression and problem-solving.

Remember, this journey is just beginning. As Gemini continues to evolve, its potential to shape our world expands exponentially.

Are you ready to dive deeper into the Gemini Universe? Keep your eyes peeled because there's so much more to discover and I can't wait to show you!! :)

Happy Learning!