How we managed Postgres HA with Logical Replication using Patroni

Pavankumar Harikar
3 min readJul 30, 2022

--

Streaming replication, a standard feature of PostgreSQL, allows the updated information on the primary server to be transferred to the standby server in real-time, so that the databases of the primary & standby servers can be kept in sync.

Postgres Logical Replication With Patroni

Let’s relook at a couple of concepts that will help us understand the problem.

What is Logical Decoding?

Logical decoding is the process of extracting all persistent changes to a database’s tables into a coherent, easy-to-understand format that can be interpreted without detailed knowledge of the database’s internal state. In other words, Logical decoding uses an output plugin to convert Postgres’s write-ahead log (WAL) into a readable format.

Logical decoding in a way, allows you to stream data changes to external consumers in the form of event streaming or CDC events.

What is Replication slot?

Per definition, replication slot represents a stream of changes that can be replayed to a client in the order they were made on the primary server. It ensures that the primary server will retain the WAL logs that are needed by the standby even when they are disconnected from the primary server.

In other words, if the standby goes down, the primary server keeps track of how much the standby lags and preserves the WAL files it requires until the standby reconnects. The WAL files are then decoded and played back on the standby server.

Now let’s look at the problem statement:

Postgres maintains the replication slot on the Primary server. If there is a switchover/failover to promote one of the standby server, the new primary server won’t have any idea about the replication slot maintained by the previous primary node. Which breaks the logical replication from the downstream systems or if a new slot is created, it becomes unsafe to use.

What is the expected result?

  1. Automatically percolate the replication slot information from the primary server to all active standby servers.
  2. When the LSN numbers advance on the primary server, percolate these changes to all the active standby servers.

Let’s start the setup

For setting up Postgres HA using patroni take look at the following post

Configuration changes in Patroni

For the detailed petroni configuration refer to this git repo

Step 1- Change the wal_level property to “logical”

Logical replication changes — Patroni

Step 2- Add the logical slot property

Patroni — These slots will be preserved during switchover/failover

As soon as the step 2 is completed, patroni in the background copies the replication slot information from the primary server to all the active standby servers.

Step 3- Verification of logical slot and LSN number on all servers

select * from pg_replication_slots;

On Primary Server

Primary server replication slot and lsn records

On Standby servers

Standby server replication slot configuration and lsn records

To check the outstanding sync status on all standby servers

SELECT redo_lsn, slot_name,restart_lsn, round((redo_lsn-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind FROM pg_control_checkpoint() c, pg_replication_slots s;

Summary

We see that replication slot information and LSN number in primary and standby servers are in sync. This means that any failover should not affect the downstream applications that have subscribed to the change data capture(CDC) events.

--

--