Using Gemini API Functions: A Python Guide

6 min read4 days ago

The new “Gemini Pro 1.5 002” model has function calling capabilities, similar to OpenAI and Anthropic. While these other platforms offer relatively straightforward implementations, getting functions to work with the Gemini API via the `google.generativeai` package proved surprisingly tricky. Existing documentation is fragmented, lacking a clear, consolidated Python example. Even Gemini LLM itself struggled to generate a working solution! I wrestled with piecing together the information, particularly the crucial step of managing the conversation history with function responses.

Finally, I cracked the code! This post shares my working Python solution for calling functions with the Gemini API. Whether you’re just starting out or a seasoned pro, I hope this guide provides valuable insights. If you have suggestions for improvement, please share — I’m eager to refine this approach. The code works, but it feels less than elegant.

Let’s dive in!

Import necessary libraries. Note the use of protos and struct_pb2, which are key for interacting with Gemini’s function calling features.

import google.generativeai as genai
from google.generativeai import protos
from google.protobuf import struct_pb2
from dotenv import load_dotenv
import os
from datetime import datetime

The .env file is a simple text file that stores environment variables. It’s used here to keep your Gemini API key separate from your code, which is crucial for security. Your .env file should look like this (replace the xs with your actual keys):

GOOGLE_GEMINI_API_KEY=your_actual_gemini_api_key

The load_dotenv() function reads this file and makes the variables available to your Python script via os.getenv(). This way, you don’t hardcode sensitive information directly into your code. Make sure to add .env to your .gitignore file to prevent it from being accidentally committed to version control.

# Load the API key from a .env file. This is good practice for security.
load_dotenv()
genai.configure(api_key=os.getenv('GOOGLE_GEMINI_API_KEY'))

These two Python functions are designed to be called by the Gemini language model as tools. They demonstrate how to define functions with parameters and descriptions that Gemini can understand.

def get_current_date():
    """
    description: Get current date (today) in the format YYY-MM-DD
    """
    print("Function get_current_date is called.")
    return {"current_date": datetime.now().strftime("%Y-%m-%d")}

def get_price_of_the_stock(stock_name: str, date: str):
    """
    description: Get the price of the given stock on the given date
    parameters:
      type: object
      properties:
        stock_name:
          description: The name of the stock
          type: string
        date:
          description: The given date in the format YYYY-MM-DD
          type: string
      required:
      - stock_name
      - date
    """
    print("Function get_price_of_the_stock is called.")
    return {"stock price": 123}

Let’s start writing the main code.

# Create a GenerativeModel instance
model = 'gemini-1.5-flash-002' 
functions = [get_current_date, get_price_of_the_stock]
google_model = genai.GenerativeModel(model, tools=functions)

generation_config=genai.types.GenerationConfig(
    candidate_count=1, # Only one candidate for now.
    #stop_sequences=['x'],
    temperature=0
)

This code initializes a Gemini model instance and sets up the configuration for generating responses.

- `model = ‘gemini-1.5-flash-002’`: Specifies the Gemini model to use. Make sure this is a model that supports function calling.

- `functions = [get_current_date, get_price_of_the_stock]`: Provides a list of the functions that the model can call. These are the functions you defined earlier.

- `google_model = genai.GenerativeModel(model, tools=functions)`: Creates the Gemini model instance, making the specified functions available to it. The crucial part here is the tools=functions argument.

- `generation_config = genai.types.GenerationConfig(…)`: Configures how the model generates responses:

- `candidate_count=1`: Requests only one response from the model.

- `#stop_sequences=[‘x’]`: Commented out, but this is where you could specify stop sequences. If the model generates any of these sequences, it will stop generating further text.

- `temperature=0`: Sets the temperature to 0. This makes the model’s output deterministic (it will always produce the same output for the same input), and it encourages the model to select the most likely next token, making it less creative and more focused. Higher temperature values introduce more randomness.

prompt = "What is the price of Apple stock today?"
protos_message = protos.Content(role="user", parts=[protos.Part(text=prompt)])
history = [protos_message]

This code sets up the initial prompt and message history for the conversation with the Gemini model.

- `prompt = “What is the price of Apple stock today?”`: Defines the initial user prompt. This is the question we’re asking the model.

- `protos_message = protos.Content(role=”user”, parts=[protos.Part(text=prompt)])`: Creates a `protos.Content` message object, which is the format Gemini expects. `role=”user”` indicates that this message is from the user. The `parts` list contains the actual text of the message. Even though there’s only one part in this case, it needs to be wrapped in a list. This structure allows for more complex messages with multiple parts (e.g., including images or other data).

- `history = [protos_message]`: Initializes the conversation history with the user’s prompt. This `history` list will be used to maintain the context of the conversation as it progresses, including the model’s responses and any function calls. It’s crucial for handling multi-turn interactions with Gemini.

while True:

    # Send the current conversation history to the model and get a response
    response = google_model.generate_content(history, 
                                            generation_config=generation_config
                                            )
    
    # Track token usage (optional, but good for monitoring costs)
    usage = {"model": model,
        "completion_tokens": response.usage_metadata.prompt_token_count,
        "prompt_tokens": response.usage_metadata.candidates_token_count
    }
    
    function_calls = [] # List to store any function calls requested by the model
    function_responses = [] # List to store the responses from those function calls
    done = False # Flag to indicate when the conversation is finished

    # Iterate through each part of the model's response.  A part can be either text or a function call.
    for part in response.candidates[0].content.parts:

        # Check if the current part is a function call
        if function_call := part.function_call:
            # Extract the function name and arguments from the function call
            function_args = dict(function_call.args) # Convert args to a dictionary
            function_name = function_call.name
            
            # Keep track of the function calls made
            function_calls.append(function_call)

            # Find the corresponding Python function based on the name
            function = next((each_function for each_function in functions if each_function.__name__ == function_name), None)
            
            # Raise an error if the requested function is not found
            if function is None:
                raise ValueError(f"Function {function_name} not found in tools")

            # Call the Python function with the extracted arguments
            function_response = function(**function_args) # ** unpacks the dictionary into keyword arguments

            # Store the function's response along with its name
            function_responses.append({"function_name": function_name,
                                       "response": function_response})

        # If the part is not a function call, it's a regular text response from the model
        else:
            print(response.text) # Print the model's text response
            done = True # Set the flag to indicate the conversation is finished
            break # Exit the inner loop (no more parts to process)

    # Exit the outer loop if the conversation is done
    if done:
        break  

    # Add the function calls (if any) to the conversation history
    function_calls_parts = []
    if len(function_calls) > 0:
        for each_function_call in function_calls:
            # Convert the function arguments to a struct_pb2.Struct (required by Gemini)
            struct_arguments = struct_pb2.Struct()
            struct_arguments.update(dict(each_function_call.args))
            
            # Create a protos.Part object for the function call
            function_call_part = genai.protos.Part(
                function_call=genai.protos.FunctionCall(
                    name=each_function_call.name,
                    args=struct_arguments
                )
            )
            function_calls_parts.append(function_call_part)

        # Add the function call parts to the history with the role "function"
        protos_message = protos.Content(role="function", parts=function_calls_parts)
        history.append(protos_message)
    
    # Add the function responses (if any) to the conversation history
    if len(function_responses) > 0:
        response_parts = []
        for each_function_response in function_responses:
            # Create a protos.Part object for the function response
            response_part = genai.protos.Part(
                function_response=genai.protos.FunctionResponse(
                    name=each_function_response["function_name"], 
                    response=each_function_response["response"]))
            response_parts.append(response_part)
    
        # Add the function response parts to the history with the role "function"
        protos_message = protos.Content(role="function", parts=response_parts)
        history.append(protos_message)

Output: The price of Apple stock today, September 26th, 2024, is $123.

This code implements the main loop for interacting with the Gemini model, handling function calls and responses, and managing the conversation history. Let’s break it down step by step:

1. **Main Loop (`while True`):** This loop continues until the model indicates it’s done (no more function calls are needed).

2. **Generate Response (`google_model.generate_content(…)`):** Sends the current `history` to the model and receives a response.

3. **Usage Tracking:** Records token usage for monitoring.

4. **Function Call Handling (`for part in …`):** Iterates through the parts of the model’s response.

5. **Check for Function Calls (`if function_call := part.function_call`):** Checks if a part is a function call.

6. **Extract Function Information:** If it’s a function call, extracts the function name and arguments.

7. **Find and Execute Function:** Locates the corresponding Python function and calls it with the provided arguments.

8. **Store Function Call and Response:** Appends the function call and its response to respective lists.

9. **Print Non-Function Call Responses:** If a part is not a function call (it’s regular text), prints it and sets `done = True` to exit the inner loop.

10. **Exit Inner Loop (`if done: break`):** Exits the `for` loop if a non-function call response was received.

11. **Exit Outer Loop (`if done: break`):** Exits the `while` loop, ending the conversation.

12. **Add Function Calls to History:** Formats the function calls as `protos.Content` messages with `role=”function”` and appends them to the `history`. This tells Gemini what functions were called.

13. **Add Function Responses to History:** Formats the function responses as `protos.Content` messages with `role=”function”` and appends them to the `history`. This provides the results of the function calls to Gemini.

This guide shows you how to use functions with Gemini in Python. By setting up functions correctly and keeping track of the conversation, you can connect Gemini to other tools and data. This makes Gemini more useful for lots of different tasks. The code works, but could be improved, so suggestions are welcome!

Using Gemini API Functions: A Python Guide

Written by Analystmachinelearning