A Tale of Two Functions : Function calling in Gemini

Romin Irani
Google Cloud - Community
13 min readMay 28, 2024

Building a Generative AI application comes with its own set of interesting challenges. This is especially true when you are trying to use the foundation models in combination with your own data. I am sure that you would agree if you could use the power of natural language understanding and generation of a foundation model and use that to infuse its responses with your own data served via your applications APIs, it would make for a very powerful combination.

Function calling, as they call it, is the ability to tell the model to determine which of the functions should be called to meet the request that you have sent to it, via a prompt. I probably made a terrible attempt to describe the feature in my own words but let me augment that with what some of the vendors say:

  • Connect large language models to external tools. [Reference]
  • Define custom functions and provide these to a generative AI model. [Reference]
  • Define the set of tools you want Claude to have access to, including their names, descriptions, and input schemas. [Reference]

While there are several articles and tutorial on Function calling, my goal in this blog post is the following:

  1. Describe the problem that we are trying to address and fit that within the Function calling feature.
  2. Explore how it works with Google Cloud Vertex AI’s Function calling feature.

This by no means is a definitive tutorial on this subject and my goal is to highlight how it works step by step and convince myself that it works too. All the code samples are in Java language (not Python). Its not about a language preference here but I’d like to bump up more samples in Java language. If you choose any other language for which the Vertex AI client library is available, you should ideally be able to port the code accordingly.

If you prefer to jump to the code, you can hop over to the Github repository on the same.

What are we trying to do?

I have found it a bit difficult to wrap my head around Function calling, when articles define the feature instead of using an example straight up (which I have been guilty of doing in this article too :-)). So let’s understand what we are trying to do first.

Inventory Application API

Consider that I have an inventory API that has been designed, implemented and deployed for my distribution company. The inventory API provides me information on the following two high level entities:

  1. Warehouse Information
  2. Current Inventory quantity in a specific Warehouse

For simplicity, let us consider two magical functions (the implementation is not important here), that have the following signatures:

  • String getInventoryCount(String productId, String location)
  • String getWarehouseDetails(String location)

Given this, we would like to create a Chatbot application for the organization, that allows anyone to use any of the following prompts to get the information on either the warehouse or specific inventory available at a specific warehouse location.

Sample prompts that one can provide are given below:

  • How much of P101 do we have in warehouse w101?
  • Where is warehouse w1 located?”
  • Where are warehouse w1 and w2 located?
  • How much of P1 and P2 do we have in warehouse w10?
  • What is the inventory P1, P2 and P3 in warehouse w101?
  • Where is warehouse w10 located and how many unit of p1 are there?

You would agree that if we simply give these prompts to a foundation model, it is going to respond saying that it is not able to answer this question or worse, it might even end up giving hallucinating.

What we want instead is a way for the model to interpret the prompt and determine that our functions/tools/APIs (pick your terminology) need to be invoked and then the response needs to be formatted back and given back as the prompt response.

Two points arise over here:

  1. How does the LLM know which function/tool to invoke?
  2. Will the LLM invoke it for us automatically or ask us to invoke it?

More questions follow and for that let us consider the scenarios via the sample prompt.

Where is warehouse w1 located?”

This prompt clearly indicates that the LLM needs to consider calling the getWarehouseDetails function with w1 as the value for the input parameter location.

How much of P1 and P2 do we have in warehouse w10?

This would require two invocations of getInventoryCount function, once with productid of P1 and location of w10 and next with productid of P2 and location of w10.

Where is warehouse w10 located and how many unit of p1 are there?

This is an interesting scenario. This would require three function calls. 2 times for getInventoryCount and 1 time for getWarehouseDetails.

Still more questions arise.

  1. When there are 2 or three function calls to be made, would it ask us to invoke them one by one ?
  2. Is there a possibility of asking us to invoke all the functions in parallel. We sure can make those calls to our functions or APIs .. isn’t it ?
  3. Is there a way to tell the LLM to only consider functions and not try to answer it otherwise? There could be a lot more to those prompts than the one that we have provided. Can we provide some hint , just like how we do for Temperature, tokens, etc to control whether the LLM can consider only functions, or any or none.

Interesting possibilities, isn’t it? My colleague Guillaume Laforge taught me a beautiful thing about these prompts. If you look at these prompts, they are very specific and map one-to-one with the functions that we have available. For e.g. how much inventory of P1 maps to getInventoryCount and where is warehouse w1 located maps to getWarehouseDetails.

What if you could give a prompt that says:

  • What is the total inventory of P1 and P2 in warehouse w1?
  • Do we have more of P1 or P2 in warehouse w1 ?

This is the real power of LLMs to do this kind of logical reasoning once it has received the responses from our API.

Now that we are clear on what we are trying to do and the different combinations, let us revisit what Function Calling is.

What is Function calling?

I will take the liberty of borrowing information from a couple of documentation sources here. First up is the essence of what Function calling is and I really like the step by step description that Anthropic provides and I reproduce it here. You can replace the word tools with functions or Claude with your favorite foundation LLM that supports function calling. I have modified the steps a bit to keep them simple.

Using tools with Claude involves the following steps:

1. Provide Claude with tools and a user prompt: (API request)

Define the set of tools you want Claude to have access to, including their names, descriptions, and input schemas.

Provide a user prompt that may require the use of one or more of these tools to answer, such as “What is inventory of P1 in warehouse W1?“.

2. Claude uses a tool: (API response)

Claude assesses the user prompt and decides whether any of the available tools would help with the user’s query or task. If so, it also decides which tool(s) to use and with what inputs.

Claude constructs a properly formatted tool use request.

3. Extract tool input, run code, and return results: (API request)

On the client side, you should extract the tool name and input from Claude’s tool use request.

Run the actual tool code on the client side.

Return the results to Claude by continuing the conversation with a new user message containing a tool_result content block.

4. Claude uses tool result to formulate a response: (API response)

After receiving the tool results, Claude will use that information to formulate its final response to the original user prompt.

Steps (3) and (4) are optional — for some workflows, Claude using the tool is all the information you need, and you might not need to return tool results back to Claude.

If you are a visual person, I reproduce the diagram from the official Google Cloud Vertex AI Function calling documentation.

Let’s try to answer some of the questions that we had earlier.

  1. How does the LLM know which tools or functions to use?
    We provide a list of tools or functions that the LLM can use. The information we provide about the tools needs to be clear and precise in what the function definition is, the parameters, the data types and clear descriptions for what the function does and what each parameter is. [Step 2].
    This is one of the best practices and I recommend that you read this.
  2. Will the LLM invoke the tools or function for us?
    No. The LLM does not invoke the tool or the function for us. It provides us the function and parameters to use and it is upto us to invoke our own functions/methods/APIs and return the results back to the LLM. [Steps 3,4,5,6]

Source code repository

The source code for these experiments is all available here.

This is a standard Maven project with a pom.xml in the root folder, so you should be able to take the dependencies from there, should you want to recreate it in the different way.

There are two Java main programs available:

1.src/main/java/com/geminidemo/AutomateFunctionCalling.java and

2. src/main/java/com/geminidemo/ParallelFunctionCalling.java

To run the programs, you will need to do the following in the respective Java files:

  • Replace YOUR_GOOGLE_CLOUD_PROJECT_ID with your Google Project Id
  • Replace us-central1 with another Google Cloud location should you want to change that.
  • In the main() method, you will find the different prompts all commented out. Uncomment any of the prompts before running the program.

We will come to each of the above programs and understand them in the following respective two sections.

Sequential Function calling

The source code file for this is :

src/main/java/com/geminidemo/AutomateFunctionCalling.java

When I started off with my experiments on Function calling, I was using the Gemini Pro 1.0 model. We will focus on this model in this section.

We will then look at what Gemini Pro 1.5 model enabled (Parallel Function calling) in the next section. This will help us understand some additional questions that we had raised in an earlier section and which I reproduce here:

  1. When there are 2 or three function calls to be made, would it ask us to invoke them one by one ?
  2. Is there a possibility of asking us to invoke all the functions in parallel. We sure can make those calls to our functions or APIs .. isn’t it ?

Let’s understand the source code first:

  1. The main method provides a list of prompts that we have discussed. You can uncomment any of them before you run the application.
  2. Look at line 182–192. It contains the two sample methods getWarehouseDetails and getInventoryCount. These just return dummy data for now but in the real world, these could be invoking your actual APIs or implementation code.
  3. Lines 49–53 , we are just building a map of function names and the methods that we need to invoke. This will be useful once Gemini tells us which function name to invoke.
  4. Lines 57–90 defines the Functions. So we define that as per the OpenAPI Spec. We give the function name, description, each of the parameters (name, type, description).
  5. Line 92–96 is critical. Here we are building the Tools i.e. adding these existing functions and we will be passing this to the LLM along with our prompt, so that the LLM can determine if the function needs to be invoked.
  6. Line 98–110 is the standard Model configuration settings for a few categories and to set the temperature which we have set to 0.
  7. Line 112–117, we instantiate the model passing the tools, configuration, etc and start a chat conversation session with the model.
  8. Line 121 we send the prompt the model and in Line 122, we get out first response from the model.
  9. Line 126–175 is nothing but going through the response received and if there is a function call specified, then we invoke it. We pass the response received from the function back to the model. If there is another function that it asks to be invoked, then we do so. This continues until the model does not ask us to invoke any more functions.

Sequential Function calling in action

Sample Output #1

For the prompt "How much of P1 and P2 do we have in warehouse w10?", we get the following output. You can see that we are being asked by the model to invoke the getInventoryCount method twice. Once for P1 and the other for P2:

User provided Prompt: How much of P1 and P2 do we have in warehouse w10?
Initial response:
role: "model"
parts {
function_call {
name: "getInventoryCount"
args {
fields {
key: "location"
value {
string_value: "w10"
}
}
fields {
key: "productid"
value {
string_value: "P1"
}
}
}
}
}

Need to invoke function: getInventoryCount
Executing function with parameters: P1 w10
Response: role: "model"
parts {
text: "We have 50 units of P1 in w10. \n\n"
}
parts {
function_call {
name: "getInventoryCount"
args {
fields {
key: "location"
value {
string_value: "w10"
}
}
fields {
key: "productid"
value {
string_value: "P2"
}
}
}
}
}

Need to invoke function: getInventoryCount
Executing function with parameters: P2 w10
Response: role: "model"
parts {
text: "We also have 50 units of P2 in w10."
}

No more function calls found in response

Sample Output #2

For the prompt "Where is warehouse w10 located and how many unit of p1 are there??", we get the following output. You can see that we are being asked by the model to invoke the getWarehouseDetails method first with the warehouse location as w10 and then we are asked to invoke the getInventoryCount method with the location as w10 and the productId as p1:

User provided Prompt: Where is warehouse w10 located and how many unit of p1 are there?
Initial response:
role: "model"
parts {
function_call {
name: "getWarehouseDetails"
args {
fields {
key: "location"
value {
string_value: "w10"
}
}
}
}
}

Need to invoke function: getWarehouseDetails
Executing function with parameters: w10
Response: role: "model"
parts {
text: "w10 is located at 123 Main Street. \n\n"
}
parts {
function_call {
name: "getInventoryCount"
args {
fields {
key: "location"
value {
string_value: "w10"
}
}
fields {
key: "productid"
value {
string_value: "p1"
}
}
}
}
}

Need to invoke function: getInventoryCount
Executing function with parameters: p1 w10
Response: role: "model"
parts {
text: "There are 50 units of p1 in w10."
}

No more function calls found in response

So as you can see over here, the Gemini 1.0 Pro model is doing great. However it asks us to invoke the Functions one after the other.

Enter Parallel Function calling.

Parallel Function calling

You would have noticed that the model response gives us a sequence of function calls to make one after the other. But if you look at it, it could have just determined that the functions can be invoked in parallel and could have given us a list of functions to invoke in the first original response itself. For e.g. if we had provided the following prompt: Where are warehouse w1 and w2 located?, it should have given us two function calls that we need to make to getWarehouseLocation, one with warehouse location as w1 and the other with warehouse location as w2. In that case, we could have made both the API calls ourselves, collected the response and given it back to the model in one shot to form the final response.

From Gemini 1.5 Pro and Gemini 1.5 Flash models, the model can propose several parallel function calls. This means that we will need to modify our code to expect not just one function call or multiple ones that we will then need to make before handing the API results from those function calls back to the model. The documentation does highlight a parallel function call sample.

I have provided another Java program in src/main/java/com/geminidemo/ParallelFunctionCalling.java and you will notice that we use another model here: gemini-1.5-pro-001.

If you now run the following prompt in the sample, How much of P1 and P2 do we have in warehouse w10? , you will find the response output is now as follows:

role: "model"
parts {
function_call {
name: "getInventoryCount"
args {
fields {
key: "location"
value {
string_value: "w10"
}
}
fields {
key: "productid"
value {
string_value: "P1"
}
}
}
}
}
parts {
function_call {
name: "getInventoryCount"
args {
fields {
key: "location"
value {
string_value: "w10"
}
}
fields {
key: "productid"
value {
string_value: "P2"
}
}
}
}
}

You can see that in the initial response itself, we have 2 parts that are function calls. We can now parse this response and make the function calls in parallel and return back the response to the model for the final output.

The code also needed to be changed to iterate through each of the function_call responses and not just the first one. We iterate through the function_call, invoke the function, collect the response and then send the aggregated response from our function call back into the model for the final response.

// Handle cases with multiple chained function calls
List<FunctionCall> functionCalls = response.getCandidatesList().stream()
.flatMap(candidate -> candidate.getContent().getPartsList().stream())
.filter(part -> part.getFunctionCall().getName().length() > 0)
.map(part -> part.getFunctionCall())
.collect(Collectors.toList());

StringBuilder sb = new StringBuilder();
for (FunctionCall functionCall : functionCalls) {
String functionCallName = functionCall.getName();
System.out.println("Need to invoke function: " + functionCallName);

// Check for a function call or a natural language response
if (function_handler.containsKey(functionCallName)) {
// Invoke the function using reflection
Object api_object = new MyAPI();
Method function_method = function_handler.get(functionCallName);

// Extract the function call parameters
Map<String, String> functionCallParameters = functionCall.getArgs()
.getFieldsMap().entrySet()
.stream()
.collect(Collectors.toMap(
Map.Entry::getKey,
entry -> entry.getValue()
.getStringValue()));

// Extract all the parameter values into an array
Object[] functionParameters = functionCallParameters.values().toArray();

Object result = function_method.invoke(api_object, functionParameters);
sb.append(result);

}

}
// Send the API response back to Gemini, which will generate a natural
// language summary or another function call
Content content = ContentMaker.fromMultiModalData(
PartMaker.fromFunctionResponse(
function_handler.entrySet().stream().findFirst()
.get().getKey(),
Collections.singletonMap("content",
sb.toString())));
response = chat.sendMessage(content);
System.out.println("Response: " + ResponseHandler.getContent(response));

Hope this demonstrates the Function calling feature in LLMs, specifically in Gemini.

Best Practices

I don’t want to rehash the documentation over here since the page could get updated over time but there are some excellent best practices listed over here vis-a-vis Function calling. These range from giving clear descriptions of your functions, parameters and more. Check it out.

Function calling benchmarks

Since Function calling is available across major LLM vendors and they have been releasing models at a pace that is unprecendeted, surely there has to be a project that is tracking this capability across LLMs and benchmarking them? Yes — there is and it is the Berkeley Function Calling Leaderboard. Check out the project and how your favorite LLM model compares to others.

The Berkeley Function Calling Leaderboard (also called Berkeley Tool Calling Leaderboard) evaluates the LLM’s ability to call functions (aka tools) accurately. This leaderboard consists of real-world data and will be updated periodically.

References

--

--