A Tutorial: Gemini Inference with Google Cloud Functions

3 min readApr 21, 2024

In this article, we will explore how to deploy a Cloud Function in Google Cloud Platform (GCP) to run inference tasks using large language models (LLMs) from Vertex AI. This setup allows users to easily scale and maintain models while leveraging the fully managed services of GCP. We’ll cover everything from creating the function, setting up the trigger, and writing the necessary code.

Step 1: Creating the Cloud Function

Navigate to the Google Cloud Console, and access the Cloud Functions section. Click on “Create Function” to begin setting up your new function. We’ll configure this function to respond to HTTP requests, allowing it to serve as a public API endpoint.

Step 2: Configure the Trigger

For the trigger type, select “HTTPS” and set the authentication method to “Allow unauthenticated invocations.” This configuration makes your function accessible over the internet without requiring authentication, ideal for public API endpoints. However, for production environments, consider securing your endpoints appropriately.

Step 3: Define the Function and Its Dependencies

Select the Runtime: For this function, choose Python 3.12, as it provides the latest features and support for modern Python applications.
Code and Entry Point: Enter your code in the inline editor provided in the console. Ensure that the function name in your Python file matches the entry point specified in the function’s settings. This is crucial for the Cloud Function to locate and execute your code correctly.
Dependencies: In the requirements.txt file, specify all the necessary libraries your function needs to run.

import os
import json

from google.cloud import logging
import functions_framework
import vertexai
from vertexai.preview.generative_models import GenerativeModel, Part

PROJECT_ID = "YOUR_PROJECT_ID"
LOCATION = "us-central1"

client = logging.Client(project=PROJECT_ID)
client.setup_logging()

LOG_NAME = "run_inference-cloudfunction-log"
logger = client.logger(LOG_NAME)

@functions_framework.http
def run_inference(request):
    request_json = request.get_json(silent=True)

    if request_json and "prompt" in request_json:
        prompt = request_json["prompt"]
        logger.log(f"Received request for prompt: {prompt}")
        vertexai.init(project=PROJECT_ID, location=LOCATION)
        model = GenerativeModel("gemini-pro")

        responses = model.generate_content(
            contents=prompt,
            generation_config={
                "max_output_tokens": 2048,
                "temperature": 0.4,
                "top_p": 1,
                "top_k": 32
            },
        stream=True,
        )

        response_list = []
        for response in responses:
            try:
                response_list.append(response.text)
            except IndexError:
                response_list.append("")
                continue
        prompt_response = " ".join(response_list)
    else:
        prompt_response = "No prompt provided."

    return json.dumps({"response_text": prompt_response})

functions-framework==3.5.0
google-cloud-aiplatform >= 1.31.0
google-cloud-logging

We can also use other models such as (Text-Bison, Text-Unicorn, or Gemini1.5 pro). You can find more details: https://ai.google.dev/gemini-api/docs/models/gemini

Step 4: Test and Deploy

Before deploying, use the “Test function” button in the GCP console. This feature allows you to catch errors early by running your function in a controlled environment. Enter a test payload and verify that your function returns the expected result.

Once testing is complete, click “Deploy” to make your function live.

Step 5: Invoking the Function

After deployment, you can invoke your Cloud Function using tools like curl or Postman. Here’s an example using curl:

curl -m 70 -X POST https://us-central1-[my_project_id].cloudfunctions.net/model_inference \
-H "Content-Type: application/json" \
-d '{
  "prompt": "Who is Albert Einstein?"
}'

We can also use the following python script,

import requests
import json

url = "https://us-central1-[my_project_id].cloudfunctions.net/model_inference"
headers = {
    "Content-Type": "application/json"
}
data = json.dumps({
    "prompt": "Who is Albert Einstein?"
})

response = requests.post(url, headers=headers, data=data, timeout=70)
print(response.text)