SQL Server AlwaysOn Availability Groups on Google Kubernetes Engine(GKE)

In my last post, I have explained how to deploy SQL Server on GKE. In this post, we will deploy SQL Server in a highly available mode on GKE using SQL Server’s native HADR method — AlwaysOn Availability Groups. SQL Server’s latest and greatest — SQL Server 2019 supports AlwaysOn Availability groups on Kubernetes. So let’s get started.

What will we build ?

Image Source: Microsoft Documentation
  • A GKE cluster with 4 worker nodes.
  • SQL Server HA Operator pod, which is a key component that provides ability to deploy, manage and maintain the SQL Server HA deployment
  • 3 SQL Server stateful sets (one primary and two read replicas) along with an AG Agent
  • Secrets to manage sa credentials, sql server master key
  • Persistent volume claims for SQL Server pods
  • ConfigMaps for operator and availability group

Deploy

The official Microsoft SQL Server sample manifests are helpful to come up with the GKE versions, you can find them here. The manifests used in this post are available at the following repo:

Create a GKE cluster, we will choose an additional zone to make sure that the nodes are spread across. Make sure you choose the latest cluster version( 1.11.6-gke.2) , I ran into issues related to sql server pods crashing with default GKE cluster version.

gcloud beta container --project "searce-sandbox" clusters \
create "mssql-ag-gke-rk" --zone "us-west1-a" --username "admin" \
--cluster-version "1.11.6-gke.2" --machine-type "custom-2-4096" \
--image-type "COS" --disk-type "pd-standard" --disk-size "100" \
--num-nodes "2" --additional-zones "us-west1-b"

Get credentials to interact with the cluster

gcloud container clusters get-credentials mssql-ag-gke-rk --zone=us-west1-a

We will now create SQL Server HA operator manifest and deploy, but before that you must grant yourself the ability to create roles in kubernetes because the operator deployment involves using role based access control

kubectl create clusterrolebinding cluster-admin-binding \ --clusterrole cluster-admin --user your.email@domain.com

Create a kubernetes namespace

kubectl create namespace mssql-ag

Also create secrets

kubectl create secret generic sql-secrets --from-literal=sapassword="xxxxx" --from-literal=masterkeypassword="xxxxx"  --namespace mssql-ag

Apply the SQL Server HA Operator manifest, operator.yaml. This creates an operator pod

kubectl apply -f https://raw.githubusercontent.com/yogirk/mssql-ha-gke/master/operator.yaml --namespace mssql-ag

Create persistent volume claims for SQL Server pods.

kubectl apply -f https://raw.githubusercontent.com/yogirk/mssql-ha-gke/master/sqlvolumes.yaml --namespace mssql-ag

We can now deploy SQL Server using sqlserver.yaml. This creates mssql1, mssql2, mssql3 services.

kubectl apply -f https://raw.githubusercontent.com/yogirk/mssql-ha-gke/master/sqlserver.yaml --namespace mssql-ag

We will now apply the manifest for loadbalancing services to allow access to SQL Server primary and to any of the read replicas.

kubectl apply -f https://raw.githubusercontent.com/yogirk/mssql-ha-gke/master/mssql-ag-services.yaml --namespace mssql-ag

There you go! If everything went according to the plan, we should now have a fully functional AlwaysOn Availability Group deployed on GKE. Let’s explore.

kubectl get services --namespace mssql-ag
Endpoints for primary instanceand secondary replicas

Add Database to AG

Let’s now connect, create a user database and add it to the availability group. You can use SSMS or mssql-cli (btw, try this. Finally a cool cli tool for SQL Server)

Let’s create a database and add it to the availability group.

USE master
CREATE DATABASE DBAdmin;

In order to add to the AG, you need a full backup — which initiates LSN chain.

USE master
BACKUP DATABASE DBAdmin TO DISK =N'/var/opt/mssql/data/demodb.bak'

Add to the availability group

ALTER AVAILABILITY GROUP [mssql-ag] ADD DATABASE [DBAdmin]

Let’s fire up SSMS and look at the setup in its full glory now :)

Failover

For some reason, mssql2 pod became a primary in my setup and I would like to change that — let’s do a failover. Apply the failover.yaml manifest (by editing the desired pod name to failover to)

kubectl apply -f https://raw.githubusercontent.com/yogirk/mssql-ha-gke/master/failover.yaml --namespace mssql-ag

You can verify whether this job succeeded here:

Sure enough, the new primary is mssql1-0

If you try failover using TSQL commands, it won’t work because it’s an external failover type.

That’s it. Hope you found this useful! Happy containerizing :)