AlloyDB Omni and local models on GKE
AlloyDB and Vertex AI are great cloud services providing tons of capabilities and options to serve as a main backend for development. But what if you need something different? Maybe more local and deployed as a compact self-serving deployment where all communications between different parts of the application should be as closed as possible? Or deploy it where normal access to the service endpoints is unavailable? Can we do it and still use all the good stuff from AlloyDB such as AI integration and improved vector search? Yes we can and in this blog I will show how to deploy a local AI model and AlloyDB Omni to the same kubernetes cluster and make them working together.
Deploying AlloyDB Omni
For my deployment I am using Google GKE and we are starting from creating a standard cluster. For most of the actions I am using google cloud shell and standard utilities coming with it. But you of course can use your own preferred environment. Here is command to create a cluster.
export PROJECT_ID=$(gcloud config get project)
export LOCATION=us-central1
export CLUSTER_NAME=alloydb-ai-gke
gcloud container clusters create ${CLUSTER_NAME} \
--project=${PROJECT_ID} \
--region=${LOCATION} \
--workload-pool=${PROJECT_ID}.svc.id.goog \
--release-channel=rapid \
--machine-type=e2-standard-8 \
--num-nodes=1
As soon as the cluster is deployed we can follow up preparing it for AlloyDB Omni. You can read about all requirements and about installation procedure in much more details in the documentation.
One of the requirements is to install cert-manager service. Most of the actions on the cluster is done using native kubernetes utilities like kubectl and helm. And to use the tools, we need cluster credentials. In GKE it is done by gcloud command.
gcloud container clusters get-credentials ${CLUSTER_NAME} --region=${LOCATION}
Then we can install the cert-manager service on our cluster.
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.16.2/cert-manager.yaml
Now we need to get the helm package for the latest AlloyDB Omni kubernetes operator.
export GCS_BUCKET=alloydb-omni-operator
export HELM_PATH=$(gcloud storage cat gs://$GCS_BUCKET/latest)
export OPERATOR_VERSION="${HELM_PATH%%/*}"
gcloud storage cp gs://$GCS_BUCKET/$HELM_PATH ./ --recursive
helm install alloydbomni-operator alloydbomni-operator-${OPERATOR_VERSION}.tgz \
--create-namespace \
--namespace alloydb-omni-system \
--atomic \
--timeout 5m
When the AlloyDB Omni operator is installed we can follow up with the deployment of our database cluster. We need to deploy it with googleMLExtension=true parameter to be able to work with the AI models. Also I prefer to enable internal load balancer for the database deployment. It creates an internal IP in the project VPC and I can use a small VM with psql client installed to work with the databases, load data etc. You can find more information about the load balancer in the documentation. Here is my manifest to deploy AlloyDB Omni cluster with the name my-omni.
apiVersion: v1
kind: Secret
metadata:
name: db-pw-my-omni
type: Opaque
data:
my-omni: "VmVyeVN0cm9uZ1Bhc3N3b3Jk"
---
apiVersion: alloydbomni.dbadmin.goog/v1
kind: DBCluster
metadata:
name: my-omni
spec:
databaseVersion: "15.7.0"
primarySpec:
adminUser:
passwordRef:
name: db-pw-my-omni
features:
googleMLExtension:
enabled: true
resources:
cpu: 1
memory: 8Gi
disks:
- name: DataDisk
size: 20Gi
storageClass: standard
dbLoadBalancerOptions:
annotations:
networking.gke.io/load-balancer-type: "internal"
allowExternalIncomingTraffic: true
Save it as my-omni.yaml and then apply the configuration to the cluster.
kubectl apply -f my-omni.yaml
By the way, have you noticed the value I’ve used for my password in the secret? It accepts the values encoded in base64 and you can do it using standard linux utilities. Here is an example. I am encoding password “VeryStrongPassword” to get it encoded to base64.
echo -n "VeryStrongPassword" | base64
But speaking about kubernetes secrets and passwords I would rather use more secret solution to store passwords. In GKE I prefer to use Google Cloud Secret Manager. You can read in details how to implement it the documentation. It works really well. Also it helps to integrate AlloyDB Omni with AI models which require authorization like tokens or keys.
When the database cluster and internal load balancer are deployed we should see the external service for our Omni instance.
kubectl get service
In the output we should see a service of “LoadBalancer” type with an external IP. We can use that IP to connect to our instance from a VM in the same VPC.
DB_CLUSTER_NAME=my-omni
export INSTANCE_IP=$(kubectl get service al-${DB_CLUSTER_NAME}-rw-elb -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo $INSTANCE_IP
Knowing your load balancer IP you can use it as an export variable (useful for automation) or put it directly in the command.
export INSTANCE_IP=10.128.15.195
psql "host=${INSTANCE_IP} user=postgres"
# or simply
psql "host=10.128.15.195 user=postgres"
Deploying a Model
Now we need to deploy a local model to the same kubernetes cluster. So far we have only one default pool (compute nodes for your apps) with e2-standard-8 nodes. It is enough for our AlloyDB Omni but not ideal for inference. To run a model we need a node with graphic accelerator. For the test I’ve created a pool with L4 Nvidia accelerator. Here is the command.
export PROJECT_ID=$(gcloud config get project)
export LOCATION=us-central1
export CLUSTER_NAME=alloydb-ai-gke
gcloud container node-pools create gpupool \
--accelerator type=nvidia-l4,count=1,gpu-driver-version=latest \
--project=${PROJECT_ID} \
--location=${LOCATION} \
--node-locations=${LOCATION}-a \
--cluster=${CLUSTER_NAME} \
--machine-type=g2-standard-8 \
--num-nodes=1
Keep in mind quotas for the project when you create the pools. Not all types accelerators available by default and it may dictate the way you deploy the model.
I was using Hugging Face to deploy the BGE Base v1.5 embedding model. Hugging face provides full instruction and deployment package to be used with GKE.
We need the deployment manifest and we can get it from the Huggigface GitHub.
git clone https://github.com/huggingface/Google-Cloud-Containers
If you plan to reuse the model it makes sense to use a google cloud storage (GCS) bucket to keep it between the deployments but in my case I am only testing it and skipping the bucket part. The GCS option is also included to the downloaded package.
For deployment without a GCS we need to review and modify the Google-Cloud-Containers/examples/gke/tei-from-gcs-deployment/gpu-config/deployment.yaml file replacing the cloud.google.com/gke-accelerator value by our nvidia-l4. Also we need to define limits to the resources we request or we can get an error.
vi Google-Cloud-Containers/examples/gke/tei-deployment/gpu-config/deployment.yaml
Here is the corrected manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: tei-deployment
spec:
replicas: 1
selector:
matchLabels:
app: tei-server
template:
metadata:
labels:
app: tei-server
hf.co/model: Snowflake--snowflake-arctic-embed-m
hf.co/task: text-embeddings
spec:
containers:
- name: tei-container
image: us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-embeddings-inference-cu122.1-4.ubuntu2204:latest
resources:
requests:
nvidia.com/gpu: 1
limits:
nvidia.com/gpu: 1
env:
- name: MODEL_ID
value: Snowflake/snowflake-arctic-embed-m
- name: NUM_SHARD
value: "1"
- name: PORT
value: "8080"
volumeMounts:
- mountPath: /dev/shm
name: dshm
- mountPath: /data
name: data
volumes:
- name: dshm
emptyDir:
medium: Memory
sizeLimit: 1Gi
- name: data
emptyDir: {}
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-l4
Then we can follow up by creating the namespace, service account and deploying all the rest.
export NAMESPACE=hf-gke-namespace
export SERVICE_ACCOUNT=hf-gke-service-account
kubectl create namespace $NAMESPACE
kubectl create serviceaccount $SERVICE_ACCOUNT --namespace $NAMESPACE
kubectl apply -f Google-Cloud-Containers/examples/gke/tei-deployment/gpu-config
If we have a look to the created service we can see that by default it has only cluster IP and it means it is available only inside the cluster. Nobody outside the cluster have access to the model.
gleb@cloudshell:~/blog (test)$ kubectl get service tei-service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
tei-service ClusterIP 34.118.225.12 <none> 8080/TCP 12m
gleb@cloudshell:~/blog (test)$
The service will be available for requests using endpoint URL http://34.118.225.12:8080/embed for the embeddings generation.
Register Model in AlloyDB Omni
Everything is ready to register the deployed model in AlloyDB Omni. We are starting from creating a demo database. In a psql session (remember our jump box VM?) connect as postgres user and run.
create database demo;
Let’s connect to the new “demo” database
psql "host=10.128.15.195 user=postgres dbname=demo"
And there we can register our new model using the google_ml procedures. Before registering an embedding model we need to create Transform functions which are responsible to transform input and output to the expected values. Here are functions I’ve prepared for our model.
-- Input Transform Function corresponding to the custom model endpoint
CREATE OR REPLACE FUNCTION tei_text_input_transform(model_id VARCHAR(100), input_text TEXT)
RETURNS JSON
LANGUAGE plpgsql
AS $$
DECLARE
transformed_input JSON;
model_qualified_name TEXT;
BEGIN
SELECT json_build_object('inputs', input_text, 'truncate', true)::JSON INTO transformed_input;
RETURN transformed_input;
END;
$$;
-- Output Transform Function corresponding to the custom model endpoint
CREATE OR REPLACE FUNCTION tei_text_output_transform(model_id VARCHAR(100), response_json JSON)
RETURNS REAL[]
LANGUAGE plpgsql
AS $$
DECLARE
transformed_output REAL[];
BEGIN
SELECT ARRAY(SELECT json_array_elements_text(response_json->0)) INTO transformed_output;
RETURN transformed_output;
END;
$$;
Then we register the new model with the name bge-base-1.5. I used the early described http endpoint with the cluster service IP and our transform functions.
CALL
google_ml.create_model(
model_id => 'bge-base-1.5',
model_request_url => 'http://34.118.225.12:8080/embed',
model_provider => 'custom',
model_type => 'text_embedding',
model_in_transform_fn => 'tei_text_input_transform',
model_out_transform_fn => 'tei_text_output_transform');
Tests
Let’s test it and see how many dimensions have a generated vector. here is the output:
demo=# select array_dims(google_ml.embedding('bge-base-1.5','What is AlloyDB Omni?'));
array_dims
------------
[1:768]
(1 row)
demo=#
Great! It works and shows that our embedding function returns a real array with 768 dimensions.
I used a small dataset from one of codelabs for embeddings I’ve created some time ago to generated embeddings and run a query.
demo=# \timing
Timing is on.
demo=# SELECT
cp.product_name,
left(cp.product_description,80) as description,
cp.sale_price,
cs.zip_code,
(ce.embedding <=> google_ml.embedding('bge-base-1.5','What kind of fruit trees grow well here?')::vector) as distance
FROM
cymbal_products cp
JOIN cymbal_embedding ce on
ce.uniq_id=cp.uniq_id
JOIN cymbal_inventory ci on
ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
cs.store_id=ci.store_id
AND ci.inventory>0
AND cs.store_id = 1583
ORDER BY
distance ASC
LIMIT 10;
product_name | description | sale_price | zip_code | distance
-----------------------+----------------------------------------------------------------------------------+------------+----------+---------------------
California Sycamore | This is a beautiful sycamore tree that can grow to be over 100 feet tall. It is | 300.00 | 93230 | 0.22753925487632942
Toyon | This is a beautiful toyon tree that can grow to be over 20 feet tall. It is an e | 10.00 | 93230 | 0.23497374266229387
California Peppertree | This is a beautiful peppertree that can grow to be over 30 feet tall. It is an e | 25.00 | 93230 | 0.24215884459965364
California Redwood | This is a beautiful redwood tree that can grow to be over 300 feet tall. It is a | 1000.00 | 93230 | 0.24564130578287147
Cherry Tree | This is a beautiful cherry tree that will produce delicious cherries. It is an d | 75.00 | 93230 | 0.24846117929767153
Fremont Cottonwood | This is a beautiful cottonwood tree that can grow to be over 100 feet tall. It i | 200.00 | 93230 | 0.2533482837690365
Madrone | This is a beautiful madrona tree that can grow to be over 80 feet tall. It is an | 50.00 | 93230 | 0.25755536556243364
Secateurs | These secateurs are perfect for pruning small branches and vines. | 15.00 | 93230 | 0.26093776589260964
Sprinkler | This sprinkler is perfect for watering a large area of your garden. | 30.00 | 93230 | 0.26263969504592044
Plant Pot | This is a stylish plant pot that will add a touch of elegance to your garden. | 20.00 | 93230 | 0.2639707045520192
(10 rows)
Time: 25.900 ms
demo=#
The response time was about 25 ms in average and relatively stable. Also the recall quality was quite descent returning good selection of trees from the inventory.
You can try to deploy AlloyDB Omni along with the different AI models right now in GKE or to your local Kubernetes environment. The great thing about AlloyDB Omni is that it can be deployed anywhere where you can run containers.
In the next post I will compare performance and recall with other model and with full text search. Stay tuned.