Design A Highly Available PostgreSQL Cluster With Patroni In GCP — Part 1

Bhuvanesh
Bhuvanesh
Sep 16 · 5 min read

Patroni is the new age HA solution for PostgreSQL with cloud-native features and advanced options for failover and failback. PostgreSQL is one of the world top open-source databases but the worst part is, it doesn’t have inbuilt automatic failover. Still, I remember my old days, fighting with repmgr for HA and its very hard for me to add the failed master back to the cluster. But later pg_rewind made that process more simple. The world is moving very fast to adopt the cloud. So some vintage HA solutions like DRBD, corosync + pacemaker, repmgr, and a few other technologies are out of date.

Part 2: Implementation

Problems in traditional replication:

  1. The default replication mechanism will not support the Failover.
  2. Disk-based replication, sometimes ends with data corruption.
  3. Using external tools for failover may need additional effort to keep them up and running and monitoring.
  4. Automatically adding the failed node back to the cluster, its a nightmare (but still scripting geeks can do this).
  5. Handle the Split Brain situation.

The Bot approach:

Credit: Zalando

I learned this concept from a conference by Zalando. PostgreSQL will take care of its own process, but we need something to monitor the PostgreSQL service and its replication status in distributed systems like etcd, zookeeper or Consul. But PostgreSQL can’t to DCS directly right? So if PostgreSQL went down, then the bot will start electing a new master. Also if the old master came up, then the bot will add them back to the cluster. Here BOT refers to the Patroni. Patroni is the successor of compose governor.

The Patroni:

Credit: Zalando

ETCD:

ETCD is a distributed control system. We’ll use ETCD to keep the PostgreSQL cluster’s health, node state, and other information about the cluster. The other important thing is, etcd also will be in a High availability mode. So either you can use it on GKE clusters or create a cluster on Compute engine and put a load balancer on top of it. But unfortunately, the GCP’s HTTP load balancer will support port 80 as a Front end port. But no worries, it won’t affect anything.

Credit: Zalando

Do we need HA Proxy:

To use Patroni, we need at least 3 nodes. In case of failover, we need some Virtual(or floating) IP address to make the application continue to access the database. To reduce this dependency, the HA proxy will always talk to the master node. Here is how HA proxy knows who is the master. In HA proxy, we need to give health check port. 8008 is the default Patroni rest API port. HA proxy will send a GET Request to all the nodes in the cluster in 8008 port. Only the Master node will give 200 OK status. Other nodes will return 503 Server Unavailable.

But as I mentioned above, its a Cloud Era. All the cloud providers have their own load balancers which work better than HA proxy(in terms of scalability, customization, availability and etc). Patroni provides feasibility via Rest API to determine the role of a node. On each node, we can trigger a Get Request to /master and /replica URLs. If the node is master, then the /master will return 200 OK. On the replica’s it’ll return 503. Similarly /replica will return 503 on replicas and 200 OK on slave nodes.

GCP Internal Load Balancer:

This feature will help us to deploy the GCP TCP load balancer to talk to master and slaves. We can create 2 load balancers.

  1. Writer LB — Add all the PostgreSQL under this LB. In the health check use /master as a path.
  2. Reader LB — Add all the PostgreSQL under this LB. In the health check use /replica as a path.

Lets say if you have 3 node cluster, then the Writer LB shows 1/3 nodes are healthy meantime Reader LB shows 2/3 nodes are healthy.

Standby Cluster:

Credit: Zalando

Patroni is good for HA, but if you have some DR servers on a different Region or Reporting Replica with minimal hardware configuration, Automatic failover should not promote these nodes as a Master. Or if you have any delayed replica, that node also won’t become a master. In this case, we can define which nodes should be eligible for master.

Final Architecture:

In GCP, I have designed the below architecture for Patroni.

  1. 3 nodes will be always available to take the master role.
  2. etcd the DCS for Patroni, So I deployed it on 3 compute engine and put a load balancer on top of it. The Patroni will talk to the LB IP address.
  3. We’ll configure this 3 etcd as a cluster.
  4. Created two load balancers. One for Write and the other one for read purpose. We have 2 reader nodes. So the query will be routed to these two nodes.
  5. For DR, I have added a node on other region and marked this node will become a master.
  6. One more Node for Delayed replication to bring the database server for accidental deletion. This node also will not become a master.

Part 2: Implementing this Solution

Conclusion:

This is just an overview of the architecture and some fundamentals about the Patroni. We have written a step by step guide for implementing this solution on GCP on your part blog. Sometimes it's better to have a self-managed cluster over the CloudSQL one. Hope you found this helpful, please give some claps for our solution.

Searce Engineering

We identify better ways of doing things!

Bhuvanesh

Written by

Bhuvanesh

Cloud | BigData | Database Architect | blogger thedataguy.in

Searce Engineering

We identify better ways of doing things!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade