Architecting GenAI applications with Google Cloud

Published in

Google Cloud - Community

6 min readSep 19, 2024

GenAI (Generative Artificial Intelligence), is a fascinating and rapidly evolving field of AI that focuses on creating new content or data. Instead of just analyzing or classifying existing information, GenAI can generate fresh outputs that often mimic human creativity. It can create new content like text, images, music, and code. It learns patterns from existing data and uses this knowledge to generate similar but original content.

A GenAI application is a software application that uses generative AI models to create new content, automate tasks, or provide interactive experiences. These applications leverage the power of GenAI to generate text, images, code, music, or other types of data, often in response to user prompts or inputs. It requires a GenAI model as engine to generate the required output.

Therefore, in addition to the application hosting infrastructure, you also have to think of GenAI model evaluation, hosting, modality, prompting, and tuning decisions too.

In this blog we will see how can we take the decisions while developing a generative AI application. It describes how to select a model, customize the model’s output to meet your needs, evaluate your customizations, and deploy your model. This document assumes that you already have a use case in mind, and that the use case is suitable for generative AI.

Application hosting: Compute to host your application. Your application can use Google Cloud’s client libraries and SDKs to talk to different Cloud products.
Model hosting: Scalable and secure hosting for a generative model.
Model: Generative model for text, chat, images, code, embeddings, and multimodal.
Grounding solution: Anchor model output to verifiable, updated sources of information.
Database: Store your application’s data. You might reuse your existing database as your grounding solution, by augmenting prompts using SQL query, or by storing your data as vector embeddings using an extension like pgvector.
Storage: Store files such as images, videos, or static web frontends. You might also use Storage for the raw grounding data (eg. PDFs) that you later convert into embeddings and store in a vector database.

As you can see that the architecture contains “Application hosting infrastructure” and “Model hosting Infrastructure”. While designing the application, both needs to architected well.

1. Application hosting infrastructure on Google cloud

In order to select the appropriate application hosting infrastructure, choose a product that serve your application workload. This application will make calls out to the generative model as shown in the picture .

To get more details on the application hosting infrastructure, visit Cloud Run, GKE Compute Engine links.

2. Model hosting infrastructure on Google Cloud

Google Cloud provides multiple ways to host a generative model, from the flagship Vertex AI platform, to customizable and portable hosting on Google Kubernetes Engine. The below decision tree helps to understand it better.

Refer to the following links to know more about application hosting Vertex AI ,Gemini Developer API ,GKE , Compute Engine.

Now that we know where we want to host the model, its important to know the models.

3. Model Selection

Vertex AI offers a growing collection of foundation models that developers can use to build and deploy AI-based applications. These models are fine-tuned for specific use cases and offered at various price points.

Model Garden in the Google Cloud console is a library of machine learning models that helps developers discover, test, customize, and deploy Google’s proprietary and select open-source models and assets. It contains Google’s proprietary models (like Gemini, Codey , Imagen , Text embeddings , medlm etc) , Open source models and partner models. Vertex AI foundation models are pre-trained on massive datasets and fine-tuned for specific tasks, making them highly accurate and efficient. Vertex AI foundation models can be used in various ways, from building new machine learning models to enhancing existing ones.

Choosing the Right Model for Your Use Case

When selecting a foundation model for your use case, it is essential to consider the following factors:

The task you want to perform: Different models are designed for different tasks. For example, if you want to classify images, you would choose a vision model.
The size of your dataset: Some models require more data than others to train effectively. If you have a small dataset, you may need to choose a model that is less data-intensive.
Your budget: Vertex AI foundation models are offered at various price points. It is essential to choose a model that fits your budget.

4. Grounding Solution

To ensure informed and accurate model responses, you may want to ground your generative AI application with real-time data. This is called retrieval-augmented generation (RAG).

You can implement grounding with your own data in a vector database, which is an optimal format for operations like similarity search. Google Cloud offers multiple vector database solutions, for different use cases.

Below simplified RAG example flows through the process of how an app can provide grounded answers by utilizing the similarity search feature of a database that supports vector indexing.

Note: You can also ground with traditional (non vector) databases, by querying an existing database like Cloud SQL or Firestore, and using the results in your model prompt.

More details here Vertex AI Agent Builder , Vector Search ,AlloyDB for PostgreSQL, Cloud SQL, BigQuery

5. Database and Storage selection

Foundation Large language models (LLMs) are powerful but have limitations, including resource-intensive training, outdated information, and lack of access to corporate data. Retrieval Augmented Generation (RAG) can help enterprises overcome these limitations by grounding LLMs with relevant and accurate information from external sources. This approach allows companies to build generative AI apps that comply with regulations and deliver high-quality results.

A key part of the Retrieval Augmented Generation (RAG) approach is using vector embeddings. Google Cloud offers a few options to store them.
1. Vertex AI Vector Search: A specialized tool for storing and retrieving vectors quickly and efficiently.
2. pgvector extension: Easily add vector queries in the database to support gen AI applications.
3. Cloud SQL and AlloyDB: Both support pgvector, and AlloyDB AI offers even faster performance.
To learn more about pgvector in Cloud SQL and AlloyDB, go here

GCS : There are use cases where you can upload the data on the storage buckets too which can be used to train the model using the data pipelines.

There is a lot to all the above decisions and learning never stops. But this blog help you to get started. I have tried to simplify high level decisions and hope it is useful.

Do provide your feedback!

References below

Develop a generative AI application | Generative AI | Google Cloud

Learn how to address the challenges of developing a generative AI application.

cloud.google.com

RAG with databases on Google Cloud | Google Cloud Blog

How to use RAG capabilities in Google Cloud to build enterprise generative AI apps that leverage real-time proprietary…