Design A Highly Available PostgreSQL Cluster With Patroni In GCP — Part 1
Patroni is the new age HA solution for PostgreSQL with cloud-native features and advanced options for failover and failback. PostgreSQL is one of the world top open-source databases but the worst part is, it doesn’t have inbuilt automatic failover. Still, I remember my old days, fighting with
repmgr for HA and its very hard for me to add the failed master back to the cluster. But later
pg_rewind made that process more simple. The world is moving very fast to adopt the cloud. So some vintage HA solutions like DRBD, corosync + pacemaker, repmgr, and a few other technologies are out of date.
Problems in traditional replication:
- The default replication mechanism will not support the Failover.
- Disk-based replication, sometimes ends with data corruption.
- Using external tools for failover may need additional effort to keep them up and running and monitoring.
- Automatically adding the failed node back to the cluster, its a nightmare (but still scripting geeks can do this).
- Handle the Split Brain situation.
The Bot approach:
I learned this concept from a conference by Zalando. PostgreSQL will take care of its own process, but we need something to monitor the PostgreSQL service and its replication status in distributed systems like etcd, zookeeper or Consul. But PostgreSQL can’t to DCS directly right? So if PostgreSQL went down, then the bot will start electing a new master. Also if the old master came up, then the bot will add them back to the cluster. Here BOT refers to the Patroni. Patroni is the successor of compose governor.
- Its developed by Zalando.
- Completely Python based and 100% Open source.
- Maintain the cluster status in DCS(etcd, zookeeper, and Consul).
- Rest API — get the state of the node and etc.
- While adding a node to the replication, we can use the custom method to archive this(barman, Wal-E, your own scripts).
- You can prevent some nodes(the nodes that you want only for reporting)to become a master.
- You can decide, while adding a new node to the replication, from where to sync(from which node) the data.
- Hooks — Trigger some actions once the bootstrap or when PostgreSQL starts, stops, Failover.
- Manual switchover during the maintenance window and integration with HA proxy.
- And a lot more features.
ETCD is a distributed control system. We’ll use ETCD to keep the PostgreSQL cluster’s health, node state, and other information about the cluster. The other important thing is, etcd also will be in a High availability mode. So either you can use it on GKE clusters or create a cluster on Compute engine and put a load balancer on top of it. But unfortunately, the GCP’s HTTP load balancer will support port 80 as a Front end port. But no worries, it won’t affect anything.
Do we need HA Proxy:
To use Patroni, we need at least 3 nodes. In case of failover, we need some Virtual(or floating) IP address to make the application continue to access the database. To reduce this dependency, the HA proxy will always talk to the master node. Here is how HA proxy knows who is the master. In HA proxy, we need to give health check port.
8008 is the default Patroni rest API port. HA proxy will send a
GET Request to all the nodes in the cluster in 8008 port. Only the Master node will give
200 OK status. Other nodes will return
503 Server Unavailable.
But as I mentioned above, its a Cloud Era. All the cloud providers have their own load balancers which work better than HA proxy(in terms of scalability, customization, availability and etc). Patroni provides feasibility via Rest API to determine the role of a node. On each node, we can trigger a
Get Request to
/replica URLs. If the node is master, then the
/master will return 200 OK. On the replica’s it’ll return 503. Similarly
/replica will return 503 on replicas and 200 OK on slave nodes.
GCP Internal Load Balancer:
This feature will help us to deploy the GCP TCP load balancer to talk to master and slaves. We can create 2 load balancers.
- Writer LB — Add all the PostgreSQL under this LB. In the health check use /master as a path.
- Reader LB — Add all the PostgreSQL under this LB. In the health check use /replica as a path.
Lets say if you have 3 node cluster, then the Writer LB shows
1/3 nodes are healthy meantime Reader LB shows
2/3 nodes are healthy.
Patroni is good for HA, but if you have some DR servers on a different Region or Reporting Replica with minimal hardware configuration, Automatic failover should not promote these nodes as a Master. Or if you have any delayed replica, that node also won’t become a master. In this case, we can define which nodes should be eligible for master.
In GCP, I have designed the below architecture for Patroni.
- 3 nodes will be always available to take the master role.
- etcd the DCS for Patroni, So I deployed it on 3 compute engine and put a load balancer on top of it. The Patroni will talk to the LB IP address.
- We’ll configure this 3 etcd as a cluster.
- Created two load balancers. One for Write and the other one for read purpose. We have 2 reader nodes. So the query will be routed to these two nodes.
- For DR, I have added a node on other region and marked this node will become a master.
- One more Node for Delayed replication to bring the database server for accidental deletion. This node also will not become a master.
This is just an overview of the architecture and some fundamentals about the Patroni. We have written a step by step guide for implementing this solution on GCP on your part blog. Sometimes it's better to have a self-managed cluster over the CloudSQL one. Hope you found this helpful, please give some claps for our solution.