Exploring Gemini-Pro and Gemini-Pro-Vision in VertexAI

Published in

Google Cloud - Community

4 min readDec 18, 2023

Background

Back in 13th Dec 2023 Google Cloud release Gemini-Pro and Gemini-Pro-Vision for multimodal language models in preview. With mutlimodals now I can process information beyond text which includes iimages and videos and for further information please go to the overview page. The objective of this story is to test several new features on gemini pro which includes:

(1) Mutlimodal prompt requests
(2) Chat prompt requests
(3) Function calling
(4) Get token count

Concept

The test is simple. I am thinking of processing an image to give me information about a location (city) then later give me information such as current weather through function calling and then create the result that I want. Following are the ideas that I test the models and this is intended only for quick testing.

Now the use case in my mind is that I as a user may have a place that piqued my interest however I did not know where it was. So I want to upload the image and let my assistant give me some information such as location, weather, and also things to do at a particular location (I focus on the city for now).

Following are the rough codes

First of all I can init the VertexAI in specific location that I want, and then create a file upload helper through strimlit.

import vertexai
from vertexai.preview import generative_models
from vertexai.preview.generative_models import (
    Content,
    FunctionDeclaration,
    GenerativeModel,
    Part,
    Tool,
    Image,
)
import requests
import http.client
import typing
import urllib.request
import streamlit as st
import os

vertexai.init(location="asia-southeast1")
uploaded_file = st.file_uploader("Choose a file")

Next I will create a get_current_weather function to create a simple call to openweathermap to get current weather information with a specific metrics unit (to get celcius instead of kelvin). In the same time gemini-pro (the reason why I seperate between gemini-pro and gemini-pro-vision) I can create function calls through FunctionDeclaration and add them into a specific Tool.

def get_current_weather(location, unit="metric"):
    api_key = os.environ["API_KEY"]
    url = f"https://api.openweathermap.org/data/2.5/weather?q={location}&units={unit}&appid={api_key}"
    response = requests.get(url)
    
    return response.json()

get_current_weather_func = generative_models.FunctionDeclaration(
  name="get_current_weather",
  description="Get the current weather in a given location",
  parameters={
      "type": "object",
      "properties": {
          "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
              "type": "string",
              "enum": [
                  "celsius",
                  "fahrenheit",
              ]
          }
      },
      "required": [
          "location"
      ]
  },
)

weather_tool = generative_models.Tool(
  function_declarations=[get_current_weather_func]
)

As streamlit will reload everything I need to set the condition if there is a file uploaded then we execute further which include starting with processing the file and getting the location of the image I upload and store it into location1 variable. This first call I will use the multimodal capability for image processing through the gemini-pro-vision.

if uploaded_file is not None:

    st.image(uploaded_file)
    bytes_data = uploaded_file.getvalue()
    uploaded_file_read = Image.from_bytes(bytes_data)

    my_bar.progress(10, text="processing image")

    #getting location from image using gemini-pro-vision
    model_vision = GenerativeModel("gemini-pro-vision")
    response_vision = model_vision.generate_content([
        uploaded_file_read, "give me only the name of the city where the picture is taken"
    ])

    #print for debug
    print("gemini-pro-vision\n",response_vision)
    st.toast(response_vision._raw_response.usage_metadata)

    #print for user
    st.write("Location:",response_vision.text)

    my_bar.progress(30, text="getting location from image")
    location1 = response_vision.text

Then I will add another prompt to give me several information I want such as temp, description, how to get there from my location (hardcoded Jakarta for now) and also things to do at the location. To run this I will first call to generate content and expect the tool to choose a function declared and use the variable location1 as part of the properties.

    #getting result from prompt + function calling
    prompt = f"Can please tell me what is weather real temperature, description and feels like in {location1} today, summary about the location, give me recommendation things to do and how to get there from Jakarta"

    model = GenerativeModel("gemini-pro", 
                            tools=[weather_tool])

    chat = model.start_chat()

    #first init prompt gemini-pro
    model_response = chat.send_message(prompt)

    #print for debug
    print("gemini-pro: first call\n", model_response)
    my_bar.progress(50, text="prompt to gemini")

Expected result from the image (I use random London Bridge image)

gemini-pro: first call
 candidates {
  content {
    role: "model"
    parts {
      function_call {
        name: "get_current_weather"
        args {
          fields {
            key: "location"
            value {
              string_value: "London"
            }
          }
        }
      }
    }
  }

From the function_call I can get the location (this is redundant from the first prompt however I want to test the function_call and utilize it to call a specific function) which I can pass to openweathermap to get the current weather info. Finally I will add the response to the chat and return the last response to the user.

    api_response = get_current_weather(model_response.candidates[0].content.parts[0].function_call.args['location'],"metric")

    my_bar.progress(70, text="get weather")

    #adding function_call
    model_response = chat.send_message(
        Part.from_function_response(
            name="get_current_weather",
            response={
                "content": api_response,
            }
        ),
    )

    my_bar.progress(85, text="add weather information")

    #for debug
    print("gemini-pro: adding function call", model_response)
    st.toast(model_response._raw_response.usage_metadata)
    st.write(model_response.text)

    my_bar.progress(100, text="Complete")

Following is the sample result

Exploring Gemini-Pro and Gemini-Pro-Vision in VertexAI

Background

Concept

Written by Johanes Glenn