GCP — Deploying highly available databases with GKE
GKE for Databases
Containerization of software applications has transformed ease of packaging software applications. On top of it Google Kubernetes Engine (GKE) has made management of those containers far more easier. In this post we will look at how stateful applications such as database(s) can take advantage of the same technology.
Identify database binary
Above diagram depicts an highly available deployment for any* relational database (like MySQL, Postgresql, MSSQL-Linux) on GKE. You will be able to find ready to use container images for all of these in Docker Hub.
Creating GKE Cluster
GKE cluster hosting the database should have a bi-zonal node pool in a single region. This is to ensure that HA setup which is going to be based on Regional Persistent Disks works.
GKE cluster should have nodes with spare capacity so that upon failure all the POD(s) could move to HA zone safely.
Provision storage for database
GKE offers Persistent Volume Claim (PVC) to store data. PVC can be created using either PD or PD SSD. Further, PVC can be either zonal or regional (bi-zonal). In this setup you should be using Regional PD/PD-SSD for HA purposes. Regional PD/PD-SSD automatically maintains two copies of data (disk 01 and disk02 in diagram).
This is a facade which allows client applications to communicate to the database. While there can be various low level implementations, commonly an Internal Load Balancer (ILB) is generated behind the scenes for a GKE Service. Upon failover, gke service automatically re-configures to send the traffic on new gke pod.
Deploying highly available PostgreSQL with GKE
Read above article (written by me) for detailed instructions on this setup using postgresql. However, same can be adopted to use any* common relational database.