Deploy a Generative AI application with Terraform in less than 30 minutes

Published in

Google Cloud - Community

11 min readApr 30, 2024

This article leverages an article I wrote a month ago, SQL queries + pgvector: Retrieval Augmented Generation in PostgreSQL. Here, I will host data and embeddings in a SQL instance using pgvector and the Generative AI application will run on Google Kubernetes Engine (GKE). Similarity results will be given by a SQL query , inside the chatbot app.

Three apps will be generated:

an init-db, that runs once and connects to the database as a job
another app load-embeddings that will also run once and will load data into Vertex AI and then load the embeddings into the database using pgvector (for embeddings similarity).
a chatbot api, that will run and receive curl commands containing natural language queries returning kids’ toys that satisfy the query.

This solution will be automatically deployed via Terraform, and can be replicated to other domains by simply changing the CSV file in one of the folders of the project and a Python file that contains the SQL queries to create a table and deploy the Vector Index.

We will use the folowing architecture:

Terraform is an open-source infrastructure as code (IaC) software tool created by HashiCorp. It allows users to define and provision data center infrastructure using a high-level configuration language known as HashiCorp Configuration Language (HCL), or optionally JSON. Terraform enables you to describe your infrastructure resources (such as virtual machines, networks, storage, etc.) in a declarative manner, specifying what you want the end state to be rather than the step-by-step process to get there.

Here are some key features and concepts of Terraform:

Declarative Syntax: With Terraform, you define the desired state of your infrastructure in configuration files. Terraform then figures out the necessary steps to reach that state.

Providers: Terraform uses providers to interact with various cloud providers, infrastructure platforms, and other services. Providers are plugins that extend Terraform’s capabilities to interact with specific APIs (e.g., AWS, Azure, Google Cloud, etc.). Terraform documentation for Google Cloud can be foud here and here, and lots of samples can be found here and here.

State Management: Terraform keeps track of the state of your infrastructure in a state file, saved in a bucket. This file is used to map real-world resources to your configuration, enabling Terraform to determine what changes need to be made to reach the desired state, like a diff status.

Plan & Apply: Terraform follows a three-step process: First terraform init. Then, terraform plan, which shows you what changes Terraform will make to your infrastructure, and terraform apply, which executes those changes. This allows you to review and approve changes before applying them.

Modularity and Reusability: Terraform configurations can be organized into modules, allowing you to encapsulate and reuse infrastructure components across projects. This promotes modularity, consistency, scalability and speed in managing infrastructure, as it is deployed in parallel.

Infrastructure as Code (IaC): Terraform embraces the concept of Infrastructure as Code, enabling you to version control your infrastructure configurations, collaborate more effectively, and automate infrastructure provisioning.

Overall, Terraform simplifies the process of managing infrastructure by providing a consistent and efficient way to define, provision, and manage resources across different cloud providers and platforms. It’s widely used in DevOps practices and cloud-native development workflows.

Terraform files usually have a module and its resources, like in network.tf:

Variables are defined in another file, variables.tf

A nice place to learn Terraform basics in a simple manner is Google Cloud Skills Boost (link).

Before we start, create a brand new project in Google Cloud. This will help you isolate the automatic deployment from other services you may have and also control costs associated with this deployment.

Second, if you are learning or just following this tutorial, be aware of the SQL instance costs and SSD disks associated with it.

We will clone the GoogleCloudPlatform repository in GitHub:

git clone https://github.com/GoogleCloudPlatform/cloudsql-gke-demo-for-genai

You will see we have some folders:

one for each app image that will be created (3), along with a Dockerfile, Python script, library requirements, a .yaml file that defines the docker script and the image name, and a k8s folder, containing the .yaml file for the jobs and for the deployment of the chatbot application.
the terraform folders, containing .yaml files for infrastructure deployment and variables definition.
a bash script to automatically change the __PROJECT__ name in each of the apps .yaml files.

We have to be sure we have all CLI commands available:

gcloud: installation here
kubectl: command for handling GKE (kubernetes) deployment. This will be installed via gcloud, later.
terraform:

sudo apt update
sudo apt install snapd
sudo snap install terraform --classic

Now set the project for the gcloud command:

gcloud auth login
gcloud config set project your_project

CD into the terraform-bootstrap directory, that will define global variables, as the Google Cloud project, region, bucket policies and terraform version. Create a new file, you may use nano, gedit or vim:

sudo nano terraform.tfvars

Add the following to this new file:

google_cloud_project        = "your-project"
google_cloud_default_region = "us-central1"

Then, CTRL+O, Enter, CTRL+X to save and exit.

Now we will init terraform:

terraform init

Then, plan all the alterations that will be done: it will show you all resources that will be created ( + ).

terraform plan

Finally apply the changes: the resources will be created.

terraform apply

Save this bucket name for a further step. In cloud deployment with Terraform, a storage bucket is often used to store Terraform state files. Terraform state files are essential for managing the infrastructure’s state, tracking resource dependencies, and managing updates.

Now, CD into the terraform directory containing all infrastructure we want to deploy and create a new file, backend.conf:

sudo nano backend.conf

Add the name of the bucket you got in the previous step:

bucket = "90fedee19eb3ade1-bucket-tfstate"
prefix = "terraform/state

Again, CTRL+O, Enter, CTRL+X to save and exit.

Now we will initialize terraform with the backend configuration we just created:

terraform init -backend-config=backend.conf

Now, create a terraform.tfvars file via nano/gedit/vim to configure the deployment, that must contain the following:

google_cloud_db_project     = "your-project"
google_cloud_k8s_project    = "your-project"
google_cloud_default_region = "us-central1"
create_bastion              = false

A bastion is a gateway between the public internet and private networks. Its primary purpose is to provide secure access to resources within a private network, typically in a virtual private cloud (VPC) environment. It often sits at the perimeter of the network and serves as a single entry point for administrators to access servers and other resources within the private network. If you want to test the bastion and SSH into the infrastructure, set its value to true. Otherwise, set it to false.

Now run:

terraform plan
terraform apply

This command will automatically create the following infrastructure:

A VPC network with a subnet with both a primary range and two secondary ranges. The primary range is the main block of IP addresses allocated to the subnet, and it’s often used for devices directly connected to the network. Secondary ranges, on the other hand, might be used for specific purposes such as guest access, or separating certain types of traffic.
A Cloud SQL database instance with IAM authentication enabled.
An IAM user for running apps and authenticating to the database.
A Database for the app.
A GKE autopilot cluster that will run two jobs (connect to database and generate embeddings) and also create the chatbot app.
The Workload Identity associated with the IAM user. Workload Identity is a feature that allows you to securely authenticate your workloads (such as Compute Engine VMs or Kubernetes Engine clusters) to access Google Cloud services without needing to manage service account keys manually. By using Workload Identity, you can adhere to the principle of least privilege and enhance security by reducing the exposure of service account keys or other sensitive credentials.
An Artifact Registry for pushing the three app images via Docker.

Note that infrastructure is deployed in parallel. This means that Terraform does not wait for the completion of the deployment of one resource to start deploying another one. The speed of deployment is one advantage. Another advantage is the standardization. Configuration files define the infrastructure and every time you deploy it, you will have the same infrastructure state.

It is possible to see the “+” sign in all the resources that will be created: you must agree with the deployment.

In this whole process, GKE is the one that takes more time to be created, around 10 minutes.

After it finishes deploying infrastructure, you will be notified:

Note that in this deployment, we are using Private Service Connect inside the VPC. This means that every component of the architecture interacts with others via a private connection, an internal IP, decreasing the surface exposure of the solution to malicious attackers. The external access to users will happen in a single endpoint, via a Load Balancer that will be created with Google Kubernetes Engine (GKE).

Now we will deploy the three images in Artifact Registry in three different steps:

Database: cd into init-db folder and run:

gcloud builds submit --config cloudbuild.yaml --region us-central1

2. Embeddings: Do the same for the load-embeddings folder:

gcloud builds submit --config cloudbuild.yaml --region us-central1

3. Chatbot app: Also for the chatbot-api folder:

gcloud builds submit --config cloudbuild.yaml --region us-central1

Each one of these cloudbuild.yaml files will run a docker command and define the name of an image, creating it inside the Artifact Registry, automatically. If you go to Artifact Registry, you will se a folder with the three Docker images generated.

Now, let’s point the Kubernetes .yaml files (k8s folders inside the three apps folders) to the images we created. Google created a bash file that easily replaces __PROJECT__ name in these files automatically, just run:

./scripts/configure-k8s.sh your-project

We’re almost ready. Install the GKE plugin for gcloud:

gcloud components install gke-gcloud-auth-plugin

Also, we will need kubectl command to interact with GKE:

gcloud components install kubectl

Enter the cluster name (prod-toy-store-semantic-search) and get the credentials to interact with the Kubernetes cluster:

gcloud container clusters get-credentials prod-toy-store-semantic-search --region=us-central1

Now we can deploy the K8 job that connects to the database, create the database an get an IAM permission. The job.yaml file will use the init-db container image we just created to connect to the database, as a job for Kubernetes.

In the root folder of the repo, run the init-db job:

kubectl apply -f init-db/k8s/job.yaml
kubectl get jobs

Do the same for the load-embeddings folder, to run the job to create the embeddings and the vector index:

kubectl apply -f load-embeddings/k8s/job.yaml
kubectl get jobs

This will use the CSV that exists inside the load-embeddings folder, connect to the database, apply chunking via RecursiveCharacterTextSplitter, generate embeddings and after that, create and load a table with the product name and embedding vector as a Vector Index.

You will see both of the jobs run once and then show as COMPLETED:

kubectl get jobs

The interesting aspect here is that you can use exactly the same project/infrastructure we are building with a different CSV file, as long as you have the same columns or slightly adapt the main.py file in load-embeddings folder for your use case, because this file will use SQL syntax over the CSV columns.

Once both jobs are COMPLETED, we will deploy the chatbot app:

kubectl apply -f chatbot-api/k8s/deployment.yaml

kubectl get jobs

Then we create a load balancer to make the API accessible from the public internet.

kubectl apply -f chatbot-api/k8s/service.yaml

kubectl get services

Wait for the External IP to be created. Here the app will not use Flask, but FastAPI. The response for the query will be given by embeddings similarity via pgvector query inside the main.py file at the app folder in chatbot-api, integrated in a conversation via LangChain:

Once you applied embeddings + chunking to the dataset, you also have the option to use a LLM, like Gemini ou Gemma, to use this context in a RAG solution. The infrastructure is ready and all you need to do is to add the LLM inside this main.py file.

As you will see, takes some minutes to create the external IP. Once you have it, curl it: check if it is successful (200).

Now we are able to submit our queries to the GKE endpoint via curl. You can also use Python with requests library. Simply get the URL of the endpoint and POST the query to the endpoint.

curl 35.223.23.44/chatbot --get --data-urlencode \
"q=what is a good toy for rainy days?"

curl 35.223.23.44/chatbot --get --data-urlencode \
"q=what is a good toy for the beach?"

T o remove all deployed infrastructure, first you need to delete the GKE deployment. In the root folder of the project, run:

kubectl delete -f chatbot-api/k8s/deployment.yaml

Then delete the load balancer:

kubectl delete -f chatbot-api/k8s/service.yaml

Then, delete the infrastructure. You might have to run destroy twice if you see errors, like in SQL instance deletion, given that a Private Service Connection was created. DO NOT delete any component without Terraform, otherwise you will break it and it won’t work anymore. If this happens, you will have to delete resources individually.

cd terraform
terraform destroy

Acknowledgements

✨ Google ML Developer Programs team supported this work by providing Google Cloud Credits✨

Deploy a Generative AI application with Terraform in less than 30 minutes

Written by Rubens Zimbres