Learning to build a Gemini-powered Serverless App

Published in

Google Cloud - Community

12 min readJul 11, 2024

An architecture design, representation of how you can build a serverless application with Gemini

With the “AI Wave” sweeping through industries globally, we can’t ignore its resemblance to previous waves of great change — the arrival of the Internet, as well as Cloud Computing. This current wave of change has only gotten bigger since 2023, and looks like it is here to stay 👍

But as a developer of Frontend/Backend/DevOps applications, what does this mean 🤔 ? Should we all start to pivot our expertise towards AI? How do we ride this wave?

Today, I would like to answer this question from one important angle — that our present skill sets for developing applications can complement AI professionals such as Data Scientists to ML Engineers. But it requires us to embrace and understand this “wave” of change to a reasonable extent.

Let’s use Gemini and Vertex AI to help explain how we can achieve this 🙂, although other services such as from ChatGPT should work in similar ways too.

Chapter 1 — What is Gemini? (And what models to focus on in 2024)

Let’s start off with the quintessential question — What is Gemini?

Current literature, such as through Gemini’s FAQs, would be direct on what Gemini is — Google’s LLM that answers questions based on language patterns from content that it was trained on, as well as data that was fetched from other Google services.

But beyond that, what might not be clear to newcomers, would be that there have been several rounds of rebranding to merge multiple product lines into one single label — Gemini. We have:

Bard (Gen AI Chatbot, now branded as Gemini),
Duet AI for Developers (Code personal assistant, now branded as Gemini Code Assist, and under the Gemini for Google Cloud suite of productivity tools for developers),
And finally, Duet AI for Workspace (Google Workspace productivity tools, now branded as Gemini for Google Workspace)

For application developers, with the rebranding of Duet AI, and expansion of product offerings under the label of “Google Cloud”, there’s much more that us developers can rely on Gemini for!

source: Google Cloud, author: Brad Calder

Productivity tooling aside, what we are interested in as developers, would be the capabilities of what we can build using Gemini. Gemini’s inference capabilities can be leveraged on today using the Gemini API, or the more preferred method of utilising the various SDKs by Google.

We’ll talk more about the SDKs in the next chapter, but let’s first touch on an important topic on Gemini — its model offerings which is a requirement for using Gemini. “Importing” a model offering is made easy with the API/SDK as seen below:

// See the full example here: https://github.com/Weiyuan-Lane/chat-with-gemini/blob/main/server/server.js#L118
const generativeModelWithFunctionCalling = vertex_ai.getGenerativeModel({
  model: 'gemini-1.5-flash-latest',
  ...
});

By making model initialisation simple, application developers will just need to contend with the complexity of choosing the right model to use. To do that, we need to know how to differentiate between the model versions.

Simply put, model versions are similar to what application developers would consider as different semantic version (semver) numbers of a library, with higher versions boasting overall better performance.

In addition to the semver, each version can also have parallel builds. These parallel builds can either be differentiated by

features (like Gemini 1.0 Pro, which originally had no image capabilities, but had that in Gemini 1.0 Pro Vision ),
or by inference performance to cost tradeoffs (Gemini 1.0 Flash is cheaper to utilise than Gemini 1.0 Pro, while the pro variant has better inference performance)

Gemini model offerings — source: Google Cloud

Here’s some tips when investigating the different model offerings.

Originally, it was cheaper to use and experiment with Gemini 1.0 Pro , given its higher limit for free quota. However, with the release of Gemini 1.0 Flash during Google IO with its better performance and lower price point, Gemini 1.0 Flash should now be the prime model for general purpose use cases and experimentation:

Model Pricing, where “Flash” is ≥10 times cheaper on most grounds!— source: Google Cloud

Model use for free quota per minute, where “Flash” useful for experiment given its quota — source: Google Cloud Console

For serious and scalable use cases that requires better inference capabilities, Gemini 1.5 Pro should be the main choice now, passing over Gemini 1.0 Ultra in May 2024:

Inference capabilities compared across different Gemini models— source: Google Cloud

On the other hand, avoid using Gemini 1.0 Pro Vision at all costs, as it is facing deprecation warnings since mid June. From 12th July onwards, it will additionally fail if used directly, so you should migrate out of it ASAP if you are still using it:

Gemini 1.0 Pro Vision, deprecation notice — source: Google Cloud

How do I decide on the model to use, post 2024?

Following what we we have discussed above, if you are reading this in the future and want to have a benchmark for comparison, it is always useful to compare the inference performance using the product page, and pricing on their pricing page, in order to make the most informed decision.

Chapter 2— How do I get started with Gemini? (Vertex AI Studio and Google AI Studio)

To get started with experimenting on Gemini, I would primarily recommend using Google’s AI Studio. It’s a free portal to try out Gemini models and prompts without any onboarding constraints, like having to provide some credit card number.

Using Google AI Studio, you can play around with the settings, such as the model offerings as we discussed in the last chapter, and experiment with prompts that you might use in your application later on. One interesting addition here, is that you can also run “Get code” for your experimental prompts, to accelerate integration to your application with little hassle:

Using Google AI Studio, from testing prompts, to getting code for implementing in your application — source: Google AI Studio

Once you are past the experimentation phase, and looking to obtain an API Key for Gemini to build applications with, you’ll need to create a Google Cloud project, attached with a billing account (such as using your credit card). At this point, it might make sense for you to use the Vertex AI Studio in the Google Cloud Console, which has similar but more features than Google AI Studio above!

SDK — Python, Node.js, Go? How do I use it?

Moving on, as an application developer looking to build features using the Gemini API, you’ll need to find a language to code your application in. To this end, you’ll definitely find interest within the following SDKs:

As of 11th July, only the Python Vertex AI SDK has achieve General Availability status — source: Google Cloud

As an application developer myself, I’m extremely happy to see that Go and Node.js are supported languages — even though they are still in preview status, but should approach “General Availability (GA)” in time.

Usage of the SDK is very simple — as mentioned in the last chapter, you’ll just need to provide the named version of the model… (references to a list of these models below)

Supported models that you can reference in the SDK — source: Google Cloud

… and then implement it (example is using Node.js SDK):

// See my example application for more, at https://github.com/Weiyuan-Lane/chat-with-gemini
const generativeModelWithFunctionCalling = vertex_ai.getGenerativeModel({
  model: 'gemini-1.5-flash-001',
  generationConfig: {
    'maxOutputTokens': 2048,
    'temperature': 1,
    'topP': 0.95,
  },
  safetySettings: [...],
});

const result = await generativeModelWithFunctionCalling.generateContent({...});

Some tips as a quick start for you — good methods from the SDK to integrate / experiment / try out are:

generateContent() — for a single prompt, and receiving the entire response at one go (easiest to implement like calling an API endpoint, but waiting for the whole response might take a while and impact UX)
generateContentStream() — for a single prompt, but streaming the output. Good for integrating with server push mechanisms like Websockets, to deliver the response in chunks to the user.
startChat() and then sendMessage() — for multi-turn prompts, like a back-and-forth chat. sendMessage() also functions similarly like generateContent() in receiving the entire response at one go.
startChat() and then sendMessageStream() — for multi-turn prompts, but streaming the output like generateContentStream().

You can check out my sample Node.js codebase which uses the Node.js SDK on an example on how to use the SDK, along with integrating with an Angular frontend.

Aside from the more popular languages like Node.js and Go, I would also recommend using the Python SDK (which is the only SDK that is GA at today’s date of 11th July), or using Streamlit directly as one way to quickly prototype and build applications without having to worry about the UI (good for internal applications).

To get started with Streamlit, check out this no-cost CloudSkillsBoost Lab here (costs might change from time to time, so make sure you try this out when you can!):

Learn to easily build Gemini powered applications with Streamlit! (first — the lab, second — the rendered UI) — source: CloudSkillsBoost

Deploy easily with images and containers!

Lastly, as application developers, let’s talk about where to host our Gemini-powered applications!

The great thing about using the SDKs above, is that you can containerize the application within docker images. This allows you to use any cloud providers to host your Gemini-powered application, and not vendor-lock to Google Cloud, even with Gemini as Google Cloud technology.

Though, if you do intend to stay with Google Cloud to deploy your Gemini-powered application, I would recommend using Cloud Run, a fully managed serverless solution for deploying your containerized applications. It is relatively easy to configure for scalability, and also cost effective with its scale-to-zero feature (as opposed to hosting in VMs where you might need one instance up at all times):

There are other advantages to using other Google Cloud’s technology as well. In the architectural design below, you can see that in a production grade application, we’ll need to integrate tools like CICD (represented as Cloud Build) and deployment concerns like secret management at deploy and run time (with tools like Secret Manager):

Setup of your Gemini-powered application, using Cloud Run (serverless) and other tools in the Google Cloud ecosystem — source: Weiyuan (myself)

Not only are these tools available in Google Cloud’s ecosystem, but it also integrates more smoothly due to the existence of Application Default Credentials (ADC):

Why host in Google Cloud? Because of Application Default Credentials (ADC) — source: Google Cloud

With ADC, there’s no need to manage your API Keys, or being worried about leaking the credentials within your codebase. You can assign the permissions directly to your workloads (like Cloud Run) using the IAM features in Google Cloud.

Chapter 3— New stuff for Gemini in Google Cloud — Function Calling!

Let’s talk about a new feature that will prove useful for application developers like you and me. This feature here is called “Function Calling”!

Today, most use cases for Gen AI models generally revolves around creating prompts from some user input, then passing the prompt to the model, and finally passing the model’s text response back to the user. This methodology works for building chat bots, but there is so much more that we can build other than chat bots.

In comes “Function Calling” — this feature is a godsend in helping application developers go much further. You can use this feature to get structured output from Gemini (e.g. JSON), allowing you to compose your output to your user in an intended format (be it in HTML, PDF, or even the same JSON to some frontend application).

As an application developer, isn’t that how we usually build our applications? From the frontend, we call some backend API with some structured input, and then receive some structured response to render the frontend UI. This will help you to transcend beyond chat bots with your Gemini-powered app!

With all that is said, here’s how you can use Function Calling:

Messaging and translation, powered by Gemini— source: Weiyuan (myself)

In the above, we see one good example of using Function Calling with Gemini, using a messaging app where two users communicate in different languages. If we are not using Function Calling and using prompts plainly, we might end up with the following outcome from our backend server prompt (occasionally, not deterministic):

Input prompt: Please translate this string “I want a Pikachu” from ‘en’ to ‘zh’
Output response: Your translation is “我想要一只皮卡丘”

By using Function Calling, we not only avoid unneeded content from the model (“Your translation is…”), we also avoid having to parse the output ourselves as seen in the diagram above. This allows us to return only the intended content in the structured JSON response back to the frontend application and the user!

Using Function Calling

To use Function Calling, what you need to do first is to declare some function, in my case, I will call it translate:

const translateFunctionDeclaration = {
  name: 'translate',
  description: 'Get the translation for a sentence from one origin language to a target language',
  parameters: {
    type: FunctionDeclarationSchemaType.OBJECT,
    properties: {
      translation: {
        type: FunctionDeclarationSchemaType.STRING,
        description: 'The translated text'
      },
      code: {
        type: FunctionDeclarationSchemaType.STRING,
        description: 'The language code of the translation'
      },
      confidence: {
        type: FunctionDeclarationSchemaType.NUMBER,
        description: 'The confidence level of the translation between 0 to 1'
      }
    },
    required: ['translation', 'confidence', 'code'],
  },
};

Then, you will need to reference the function in your prompt:

const result = await generativeModelWithFunctionCalling.generateContent({
  contents: [
    {
      role: 'user',
      parts: [{
        text: `Translate this message "${message}", from "${srcLang}" to "${targetLang}", and return the translated text.`,
    },
  ],
  tools: [
    {
      function_declarations: [translateFunctionDeclaration],
    }
  ],
  tool_config: {
    function_calling_config: {
      mode: 'ANY',
      allowed_function_names: [translateFunctionDeclaration.name],
    },
  }
});

Finally, getting the output if the function call matches your declaration:

if (result?.response?.candidates[0]?.content?.parts[0].functionCall &&
  result?.response?.candidates[0]?.content?.parts[0].functionCall.name === translateFunctionDeclaration.name
) {
  console.log(JSON.stringify(result.response.candidates[0].content));
  const call = result.response.candidates[0].content.parts[0].functionCall;
  const jsonResponse = call.args;
  return {
    content: jsonResponse,
    metadata: result.response.usageMetadata
  };
}

Viola, we’ve managed to get Gemini’s jsonResponse and deliver that back to our user! You can checkout more of the code above, from my sample Node.js codebase .

Alternative usage for Function Calling

While I am excited about the structured output that Function Calling brings to application developers like you and I, it can also be used to further augment data and capabilities that Gemini does not possess (like predicting the weather).

Function Calling as explained by Google — source: Google Cloud

The above diagram describes it best, but let’s break it down into four parts to explain how it helps with making chat use cases better!

In the first part, you will need to declare a list of functions (just like how we did for translate before). Let’s say that in the context of a weather application, you will need the following functions:

If a user is interested in finding out whether it would rain, the prompt will select the relevant precipitation function and extract the required portions in a structured format (like how we did translation earlier):

Using the structured content, you can then forward these values as input to other internal or external 3rd party API services:

From the API response, we pass the values back to Gemini, which then outputs a human readable sentence about the weather back to the user:

And that’s how you can use Function Calling, as a proxy to other capabilities, to make your chat better!

To conclude…

…we have went through a journey of understanding and differentiating Gemini model offerings, then exploring both the AI studios and SDK solutions to start building with Gemini, and finally ending with some practical examples using a new feature of Gemini — Function Calling — creating deterministic and structured output that are essential for application development!

With the above, you should be ready to kick start your “build with AI” journey. All the best!