Design A Highly Available PostgreSQL Cluster With Patroni In GCP — Part 2

Bhuvanesh
Bhuvanesh
Sep 16 · 10 min read
Credit: Zalando

This is the implementation part of Design A Highly Available PostgreSQL Cluster With Patroni In GCP along with ETCD, TCP, and HTTP Load balancers. Lets begin the setup. Here is my inventory of PostgreSQL, etcd and load balancers. (the load balancer IP you can create while creating the LB).

PART 1: the concept of the HA PostgreSQL with patroni

Configure ETCD cluster:

We’ll install the ETCD on both nodes. I strongly recommend keeping 3 nodes in the cluster for production servers. I used Ubuntu 19.04, you may follow other blogs to install it on CentOS or other OS.

-- On all etcd nodes
apt-get update
apt-get install etcd

On etcd Node 1 add the lines at the end

vi /etc/default/etcd

On etcd Node 2 add the lines at the end

ETCD_INITIAL_CLUSTER="bhuvi-etcd1=http://10.128.0.114:2380,bhuvi-etcd2=http://10.128.0.115:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-01"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.128.0.115:2380"
ETCD_DATA_DIR="/var/etcd"
ETCD_LISTEN_PEER_URLS="http://10.128.0.115:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.128.0.115:2379,http://127.0.0.1:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://10.128.0.115:2379"
ETCD_NAME="bhuvi-etcd2"

start etcd on both nodes asap. Becuase if you start the node1 first, it’ll wait for few seconds to connect other nodes. At this time, other nodes are unavailable then, the etcd will not start. So open two terminals for both the nodes, and run the start command asap.

service etcd start

Check the cluster health:

-- run this command on any one of the nodes
etcdctl cluster-health

Create an instance group:

  • Go to VM → Instance group → Unmanaged instance group.
  • Choose your region and network.
  • Named port: Port Name — etcd and Port Number 2379
  • Add your etcd instances.
  • If you have etcd instances in multiple zones, then create multiple instance groups.

Create HTTP Load balancer for ETCD:

  • Go to Network Services → Load Balancer → HTTP(S) Load balancer.
  • Choose Only between my vms.
  • Name, Region, Network — please select your relevant region name and network.

Backend Services → create new

  • Name: patroni-etcd-backend
  • Protocol: HTTP
  • Named Port: etcd
  • Timeout: 30 seconds
  • Backend type: Instance group
  • Under instance group, select your Vm group. If you have more VM group add all of them.
  • Port Numbers: 2379
  • Maximum RPS: (its your wish) I choose 100

Health Check → Create

  • Name: etcd-health-check
  • Protocol: http
  • Port: 2379
  • Request Path: /health
  • Leave the rest of the parameters as it is. (Or my config is 2 secs for all)

Click on Create button.

Fronend Configuration:

  • Name: patroni-etcd-frontend
  • Protocol: HTTP
  • Subnet your — Choose where you want to host this LB.
  • Internal IP: choose your reserved IP or Create a new one.
  • Port: 80 (or you can use 8080, it doesn’t matter)

Then create the LB.

This is my LB

Now lets move to the Patroni and PostgreSQL. For a change Im trying to convert my existing standalone PostgreSQL node to Patroni.

Install PostgreSQL On all nodes:

sudo apt-get install curl ca-certificates gnupg

Change the data directory:

mkdir /pgdata/data
chown -R postgres:postgres /pgdata/data
chmod 0700 /pgdata/data

Now I have 1 PostgreSQL server standalone server. We’ll move this into Patroni cluster. You can do this without downtime, but you need the below parameters are already enabled. Else we need to restart once we moved into the Patroni.

  1. wal_level
  2. hot_standby
  3. wal_keep_segments
  4. max_wal_senders
  5. max_replication_slots
  6. checkpoint_timeout

Install Patroni:

apt-get install -y python python-pip python-psycopg2 
pip install --upgrade setuptools
pip install python-etcd
pip install psycopg2-binary
pip install patroni

Create a service for Patroni:

vi /etc/systemd/system/patroni.service

You need an environment variable on PostgreSQL user to determine the location of the data directory.

Replace /pgdata/data/ with your data directory location.

su postgres
vi ~/.bash_profile

Add Patroni config file on the master node:

vi /etc/patroni.yml

Start Patroni:

service patroni start

Check the cluster list:

patronictl -c /etc/patroni.yml list

But the configurations of the patroni is out of data, and the replication parameters are not enabled on this PostgreSQL server. So we need to restart this node. If you already enabled those parameters, then this step is not required. [my cluster name is postgres]

The pg_hba.conf file should be updated before the restart. Because now the postgresql will use the conf file in the /pgdata/data/ location.

vi /pgdata/data/

patronictl -c /etc/patroni.yml restart postgres

If you see its failed, but it’ll try to add all the parameters and try to restart automatically or you can do restart one more time.

2019-09-15 16:46:34,875 INFO: Lock owner: None; I am bhuvi-psql
2019-09-15 16:46:34,879 INFO: starting as a secondary
2019-09-15 16:46:34,895 INFO: postmaster pid=13029
2019-09-15 16:46:35,479 ERROR: Can not fetch local timeline and lsn from replication connection
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/patroni/postgresql/__init__.py", line 657, in get_local_timeline_lsn_from_replication_connection
with self.get_replication_connection_cursor(**self.config.local_replication_address) as cur:
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/patroni/postgresql/__init__.py", line 651, in get_replication_connection_cursor
connect_timeout=3, options='-c statement_timeout=2000') as cur:
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/patroni/postgresql/connection.py", line 43, in get_connection_cursor
with psycopg2.connect(**kwargs) as conn:
File "/usr/local/lib/python2.7/dist-packages/psycopg2/__init__.py", line 126, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
OperationalError: FATAL: no pg_hba.conf entry for replication connection from host "127.0.0.1", user "replicator", SSL off
2019-09-15 16:46:36,018 INFO: promoted self to leader by acquiring session lock

Now the master node is ready and its a part of the cluster now. Its time to add slaves. We just installed PostgreSQL and Patroni, but didn’t start anything. So this time if we start the patroni, it’ll automatically understand its a slave and start doing the pg_basebackup from the primary. Or you can use pg_backrest , Barman, WAL-E and custom scripts. But basebackup is the default one.

Create the conf file the same as Primary but make the below changes.

-- on node 2

Now start the Patroni on node2 and then node3. And check the status.

All the nodes are up and running now. You can bring down the master and see the failover process. Also you can bring down one of your etcd and test its high availability.

Split Read and Write:

This is another important part. If you are familiar with AWS Aurora’s endpoints, they are providing two different endpoint. One will always point to the master and the other one will point to the slaves. If you have multiple slaves, then it’ll automatically do the load balancing. In HAproxy, many people doing the same. But in GCP we don’t want to spend time and money to run HAproxy servers. We can directly use TCP load balancers for this.

I got this from Patroni’s github page:

How does haproxy determine who is the primary and connect to the primary?

Haproxy sends health-check probes to all patroni nodes it knows:

Look into config file: https://github.com/zalando/patroni/blob/master/haproxy.cfg#L24 check port 8008

8008 is the patroni rest api port.
If node is running as a master, patroni will respond on request ‘GET /’ with http status code 200, in other case it will return 503.

haproxy sends traffic only to the nodes which responding with 200.

It is also possible to do read load balancing with haproxy, but for that it should do a different health-check: GET /replica

Yes, the TCP load balancers can replace this.

Create an instance group:

  • Go to VM → Instance group → Unmanaged instance group.
  • Choose your region and network.
  • Named port: Port Name — patroni and Port Number 8008
  • Add your postgresql instances.
  • If you have postgresql instances in multiple zones, then create multiple instance groups.

Create TCP Load balancer for Writer:

Go to Network Services → Load Balancer → TCP Load Balancer → Only between my vms.

  • Name: patroni-writer
  • Backend: Select your region, network and select your instances groups. If you have postgresql instances in multiple zones, then might have multiple VM groups, add all of them.
  • HealthCheck: Protocol → HTTP, Port → 8008, Request Path → /master, health criteria → 2 seconds.
  • FrontEnd: Select the subnet where you want to host this LB. Internal IP → if you reserved any IP, then use that or, create a new one. Port → single 5432 (if you wish to use custom port for postgresql, then you can use a different port).

Create TCP Load balancer for Reader:

Follow the above step, but in the health check, Protocol → HTTP, Port → 8008, Request Path → /replica, health criteria → 2 seconds.

Now check the status of the load balancer.

Only one maser, so the request will be routed to this one node
2 read replica, so the requests will be balanced between the vms.

Failover Test with the Load Balancer:

Lets bring down the master and see how much time it’ll take the load balancer to talk to the new master.

I ran a select query in an infinite loop, to print the current data and 1 with the writer load balancer’s IP.

2019-09-15 06:30:06.53847+00  |  1
2019-09-15 06:30:06.575362+00 | 1

It took 6 seconds. Woohoo, this is pretty much faster than the RDS and Aurora failover.

Delayed Replica:

In Patroni you can define the delayed replica with custom recovery.conf file.

postgresql:
recovery_conf:
recovery_min_apply_delay: '1hr'

How to Prevent a node from a Failover:

If you don’t want some nodes, or the delayed replica to become the master, just disable the failover from the tags.

tags:
nofailover: true

Some best practices:

  1. Always use 3 nodes for etcd.
  2. Must need 3 nodes for postgresql.
  3. Use the data directory inside a directory. Don’t use the directory which is located in /. Because if you are bootstrapping the node, the if the data dir is already there, then it’ll rename that one and create a new one with the postgres user. So if you have /data as the data dir, you can do anything inside the data, but during the bootstrap, it has to move the dir to /data_dd_mm_yy . In the root dir, its not possible. So always use data dir inside another dir.
  4. Make sure your PostgreSQL user’s bash_profile has PGDATA environment variable, else Patroni will return OSError: [Errno 2] No such file or directory
  5. TCP load balancer will not work for ETCD.
  6. max_connections and some other parameters must be same across the cluster, so don’t try to change it on one node.
  7. You can convert your existing master-slave setup into Patroni without downtime. Read the documentation clearly before touching the production servers.
  8. If you are new to the Patroni, read the Issues section, you’ll learn a lot more things about patroni which are not documented.

Some more interesting blogs:

  1. Design A Highly Available MySQL Clusters With Orchestrator And ProxySQL In GCP
  2. Configure External Listener For Always On Availability Groups in GCP

Searce Engineering

We identify better ways of doing things!

Bhuvanesh

Written by

Bhuvanesh

Cloud | BigData | Database Architect | blogger thedataguy.in

Searce Engineering

We identify better ways of doing things!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade