Design A Highly Available PostgreSQL Cluster With Patroni In GCP — Part 2

Bhuvanesh
Bhuvanesh
Sep 16 · 10 min read
Credit: Zalando

This is the implementation part of Design A Highly Available PostgreSQL Cluster With Patroni In GCP along with ETCD, TCP, and HTTP Load balancers. Lets begin the setup. Here is my inventory of PostgreSQL, etcd and load balancers. (the load balancer IP you can create while creating the LB).

PART 1: the concept of the HA PostgreSQL with patroni

Configure ETCD cluster:

We’ll install the ETCD on both nodes. I strongly recommend keeping 3 nodes in the cluster for production servers. I used Ubuntu 19.04, you may follow other blogs to install it on CentOS or other OS.

-- On all etcd nodes
apt-get update
apt-get install etcd

On etcd Node 1 add the lines at the end

vi /etc/default/etcdETCD_INITIAL_CLUSTER="bhuvi-etcd1=http://10.128.0.114:2380,bhuvi-etcd2=http://10.128.0.115:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-01"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.128.0.114:2380"
ETCD_DATA_DIR="/var/etcd"
ETCD_LISTEN_PEER_URLS="http://10.128.0.114:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.128.0.114:2379,http://127.0.0.1:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://10.128.0.114:2379"
ETCD_NAME="bhuvi-etcd1"

On etcd Node 2 add the lines at the end

ETCD_INITIAL_CLUSTER="bhuvi-etcd1=http://10.128.0.114:2380,bhuvi-etcd2=http://10.128.0.115:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-01"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.128.0.115:2380"
ETCD_DATA_DIR="/var/etcd"
ETCD_LISTEN_PEER_URLS="http://10.128.0.115:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.128.0.115:2379,http://127.0.0.1:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://10.128.0.115:2379"
ETCD_NAME="bhuvi-etcd2"

start etcd on both nodes asap. Becuase if you start the node1 first, it’ll wait for few seconds to connect other nodes. At this time, other nodes are unavailable then, the etcd will not start. So open two terminals for both the nodes, and run the start command asap.

service etcd start

Check the cluster health:

-- run this command on any one of the nodes
etcdctl cluster-health
member 74ea3293214c99ab is healthy: got healthy result from http://10.128.0.114:2379
member a3e5382602a9a572 is healthy: got healthy result from http://10.128.0.115:2379
cluster is healthy

Create an instance group:

  • Go to VM → Instance group → Unmanaged instance group.
  • Choose your region and network.
  • Named port: Port Name — etcd and Port Number 2379
  • Add your etcd instances.
  • If you have etcd instances in multiple zones, then create multiple instance groups.

Create HTTP Load balancer for ETCD:

  • Go to Network Services → Load Balancer → HTTP(S) Load balancer.
  • Choose Only between my vms.
  • Name, Region, Network — please select your relevant region name and network.

Backend Services → create new

  • Name: patroni-etcd-backend
  • Protocol: HTTP
  • Named Port: etcd
  • Timeout: 30 seconds
  • Backend type: Instance group
  • Under instance group, select your Vm group. If you have more VM group add all of them.
  • Port Numbers: 2379
  • Maximum RPS: (its your wish) I choose 100

Health Check → Create

  • Name: etcd-health-check
  • Protocol: http
  • Port: 2379
  • Request Path: /health
  • Leave the rest of the parameters as it is. (Or my config is 2 secs for all)

Click on Create button.

Fronend Configuration:

  • Name: patroni-etcd-frontend
  • Protocol: HTTP
  • Subnet your — Choose where you want to host this LB.
  • Internal IP: choose your reserved IP or Create a new one.
  • Port: 80 (or you can use 8080, it doesn’t matter)

Then create the LB.

This is my LB

Now lets move to the Patroni and PostgreSQL. For a change Im trying to convert my existing standalone PostgreSQL node to Patroni.

Install PostgreSQL On all nodes:

sudo apt-get install curl ca-certificates gnupgcurl https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -echo "deb http://apt.postgresql.org/pub/repos/apt/ buster-pgdg main">/etc/apt/sources.list.d/pgdg.listsudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'sudo apt-get update && sudo apt-get -y install postgresql-9.6 postgresql-common

Change the data directory:

mkdir /pgdata/data
chown -R postgres:postgres /pgdata/data
chmod 0700 /pgdata/data
root@bhuvi-psql# su postgres
/usr/lib/postgresql/9.6/bin/pg_ctl -D /pgdata/data/ initdb
-- Make this change in posgresql.conf file
vi /etc/postgresql/9.6/main/postgresql.conf
data_directory = '/pgdata/data'
service postgresql start

Now I have 1 PostgreSQL server standalone server. We’ll move this into Patroni cluster. You can do this without downtime, but you need the below parameters are already enabled. Else we need to restart once we moved into the Patroni.

  1. wal_level
  2. hot_standby
  3. wal_keep_segments
  4. max_wal_senders
  5. max_replication_slots
  6. checkpoint_timeout

Install Patroni:

apt-get install -y python python-pip python-psycopg2 
pip install --upgrade setuptools
pip install python-etcd
pip install psycopg2-binary
pip install patroni

Create a service for Patroni:

vi /etc/systemd/system/patroni.service[Unit]
Description=Runners to orchestrate a high-availability PostgreSQL
After=syslog.target network.target
[Service]
Type=simple
User=postgres
Group=postgres
ExecStart=/usr/local/bin/patroni /etc/patroni.yml
KillMode=process
TimeoutSec=30
Restart=no
[Install]
WantedBy=multi-user.targ

You need an environment variable on PostgreSQL user to determine the location of the data directory.

Replace /pgdata/data/ with your data directory location.

su postgres
vi ~/.bash_profile
[ -f /etc/profile ] && source /etc/profile
PGDATA=/pgdata/data/
export PGDATA
# If you want to customize your settings,
# Use the file below. This is not overridden
# by the RPMS.
[ -f /var/lib/pgsql/.pgsql_profile ] && source /var/lib/pgsql/.pgsql_profile

Add Patroni config file on the master node:

vi /etc/patroni.ymlscope: postgres
namespace: /db/
name: bhuvi-psql
restapi:
listen: 0.0.0.0:8008
connect_address: 10.128.0.18:8008
etcd:
host: 10.128.0.122:80
bootstrap:
dcs:
ttl: 100
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
wal_level: replica
hot_standby: "on"
wal_keep_segments: 8
max_wal_senders: 5
max_replication_slots: 5
checkpoint_timeout: 30
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 10.128.0.18/32 md5
- host replication replicator 10.128.0.98/32 md5
- host replication replicator 10.128.0.99/32 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 0.0.0.0
connect_address: 10.128.0.18:5432
data_dir: /pgdata/data
config_dir: /pgdata/data
bin_dir: /usr/lib/postgresql/9.6/bin
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: rep-pass
superuser:
username: postgres
password: secretpassword
parameters:
unix_socket_directories: '/tmp'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
log:
dir: /var/log/postgresql
level: INFO

Start Patroni:

service patroni startservice patroni status● patroni.service - Runners to orchestrate a high-availability PostgreSQL
Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
Active: active (running) since Mon 2019-09-16 05:10:19 UTC; 2h 30min ago
Main PID: 9191 (patroni)
Tasks: 14 (limit: 4418)
Memory: 188.1M
CGroup: /system.slice/patroni.service
├─ 9191 /usr/bin/python /usr/local/bin/patroni /etc/patroni.yml

Check the cluster list:

patronictl -c /etc/patroni.yml list

But the configurations of the patroni is out of data, and the replication parameters are not enabled on this PostgreSQL server. So we need to restart this node. If you already enabled those parameters, then this step is not required. [my cluster name is postgres]

The pg_hba.conf file should be updated before the restart. Because now the postgresql will use the conf file in the /pgdata/data/ location.

vi /pgdata/data/host replication replicator 127.0.0.1/32 md5
host replication replicator 10.128.0.18/32 md5
host replication replicator 10.128.0.98/32 md5
host replication replicator 10.128.0.99/32 md5
host all all 0.0.0.0/0 md5

patronictl -c /etc/patroni.yml restart postgres

If you see its failed, but it’ll try to add all the parameters and try to restart automatically or you can do restart one more time.

2019-09-15 16:46:34,875 INFO: Lock owner: None; I am bhuvi-psql
2019-09-15 16:46:34,879 INFO: starting as a secondary
2019-09-15 16:46:34,895 INFO: postmaster pid=13029
2019-09-15 16:46:35,479 ERROR: Can not fetch local timeline and lsn from replication connection
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/patroni/postgresql/__init__.py", line 657, in get_local_timeline_lsn_from_replication_connection
with self.get_replication_connection_cursor(**self.config.local_replication_address) as cur:
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/patroni/postgresql/__init__.py", line 651, in get_replication_connection_cursor
connect_timeout=3, options='-c statement_timeout=2000') as cur:
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/patroni/postgresql/connection.py", line 43, in get_connection_cursor
with psycopg2.connect(**kwargs) as conn:
File "/usr/local/lib/python2.7/dist-packages/psycopg2/__init__.py", line 126, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
OperationalError: FATAL: no pg_hba.conf entry for replication connection from host "127.0.0.1", user "replicator", SSL off
2019-09-15 16:46:36,018 INFO: promoted self to leader by acquiring session lock

Now the master node is ready and its a part of the cluster now. Its time to add slaves. We just installed PostgreSQL and Patroni, but didn’t start anything. So this time if we start the patroni, it’ll automatically understand its a slave and start doing the pg_basebackup from the primary. Or you can use pg_backrest , Barman, WAL-E and custom scripts. But basebackup is the default one.

Create the conf file the same as Primary but make the below changes.

-- on node 2name: bhuvi-psql2
restapi:
listen: 0.0.0.0:8008
connect_address: 10.128.0.98:8008
etcd:
host: 10.128.0.122:80
postgresql:
listen: 0.0.0.0
connect_address: 10.128.0.98:5432
-- on node 3name: bhuvi-psql3
restapi:
listen: 0.0.0.0:8008
connect_address: 10.128.0.99:8008
etcd:
host: 10.128.0.122:80
postgresql:
listen: 0.0.0.0
connect_address: 10.128.0.99:5432

Now start the Patroni on node2 and then node3. And check the status.

All the nodes are up and running now. You can bring down the master and see the failover process. Also you can bring down one of your etcd and test its high availability.

Split Read and Write:

This is another important part. If you are familiar with AWS Aurora’s endpoints, they are providing two different endpoint. One will always point to the master and the other one will point to the slaves. If you have multiple slaves, then it’ll automatically do the load balancing. In HAproxy, many people doing the same. But in GCP we don’t want to spend time and money to run HAproxy servers. We can directly use TCP load balancers for this.

I got this from Patroni’s github page:

How does haproxy determine who is the primary and connect to the primary?

Haproxy sends health-check probes to all patroni nodes it knows:

Look into config file: https://github.com/zalando/patroni/blob/master/haproxy.cfg#L24 check port 8008

8008 is the patroni rest api port.
If node is running as a master, patroni will respond on request ‘GET /’ with http status code 200, in other case it will return 503.

haproxy sends traffic only to the nodes which responding with 200.

It is also possible to do read load balancing with haproxy, but for that it should do a different health-check: GET /replica

Yes, the TCP load balancers can replace this.

Create an instance group:

  • Go to VM → Instance group → Unmanaged instance group.
  • Choose your region and network.
  • Named port: Port Name — patroni and Port Number 8008
  • Add your postgresql instances.
  • If you have postgresql instances in multiple zones, then create multiple instance groups.

Create TCP Load balancer for Writer:

Go to Network Services → Load Balancer → TCP Load Balancer → Only between my vms.

  • Name: patroni-writer
  • Backend: Select your region, network and select your instances groups. If you have postgresql instances in multiple zones, then might have multiple VM groups, add all of them.
  • HealthCheck: Protocol → HTTP, Port → 8008, Request Path → /master, health criteria → 2 seconds.
  • FrontEnd: Select the subnet where you want to host this LB. Internal IP → if you reserved any IP, then use that or, create a new one. Port → single 5432 (if you wish to use custom port for postgresql, then you can use a different port).

Create TCP Load balancer for Reader:

Follow the above step, but in the health check, Protocol → HTTP, Port → 8008, Request Path → /replica, health criteria → 2 seconds.

Now check the status of the load balancer.

Only one maser, so the request will be routed to this one node
2 read replica, so the requests will be balanced between the vms.

Failover Test with the Load Balancer:

Lets bring down the master and see how much time it’ll take the load balancer to talk to the new master.

I ran a select query in an infinite loop, to print the current data and 1 with the writer load balancer’s IP.

2019-09-15 06:30:06.53847+00  |  1
2019-09-15 06:30:06.575362+00 | 1
psql: FATAL: the database system is shutting down
psql: FATAL: the database system is shutting down
psql: could not connect to server: Connection refused
Is the server running on host "10.128.0.106" and accepting
TCP/IP connections on port 5432?
psql: could not connect to server: Connection refused
Is the server running on host "10.128.0.106" and accepting
TCP/IP connections on port 5432?
2019-09-15 06:30:12.881559+00 | 1
2019-09-15 06:30:12.91822+00 | 1

It took 6 seconds. Woohoo, this is pretty much faster than the RDS and Aurora failover.

Delayed Replica:

In Patroni you can define the delayed replica with custom recovery.conf file.

postgresql:
recovery_conf:
recovery_min_apply_delay: '1hr'

How to Prevent a node from a Failover:

If you don’t want some nodes, or the delayed replica to become the master, just disable the failover from the tags.

tags:
nofailover: true

Some best practices:

  1. Always use 3 nodes for etcd.
  2. Must need 3 nodes for postgresql.
  3. Use the data directory inside a directory. Don’t use the directory which is located in /. Because if you are bootstrapping the node, the if the data dir is already there, then it’ll rename that one and create a new one with the postgres user. So if you have /data as the data dir, you can do anything inside the data, but during the bootstrap, it has to move the dir to /data_dd_mm_yy . In the root dir, its not possible. So always use data dir inside another dir.
  4. Make sure your PostgreSQL user’s bash_profile has PGDATA environment variable, else Patroni will return OSError: [Errno 2] No such file or directory
  5. TCP load balancer will not work for ETCD.
  6. max_connections and some other parameters must be same across the cluster, so don’t try to change it on one node.
  7. You can convert your existing master-slave setup into Patroni without downtime. Read the documentation clearly before touching the production servers.
  8. If you are new to the Patroni, read the Issues section, you’ll learn a lot more things about patroni which are not documented.

Some more interesting blogs:

  1. Design A Highly Available MySQL Clusters With Orchestrator And ProxySQL In GCP
  2. Configure External Listener For Always On Availability Groups in GCP

Searce Engineering

We identify better ways of doing things!

Bhuvanesh

Written by

Bhuvanesh

Cloud | BigData | Database Architect | blogger thedataguy.in

Searce Engineering

We identify better ways of doing things!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade