The Postgres replication dilemma

Truth about hot_standby_feedback

Hussein Nasser
2 min readSep 13, 2023

You see when you have postgres replication you have a dilemma.

In one hand someone is querying standby with long running query, pinning a snapshot at t7.

On the other hand the primary has deleted some rows and vacuum is about to purge deleted tuples on t7 as no query on the primary needs them. Vacuum purges those, new WAL records of the purge.

WAL sender kicks in and send the vacuumed WAL entries to purge t7 on the standby but guess what? standby still need them. This blocks replication as the WAL receiver cannot apply the purged row if a query need them.

This creates lag in replication and as a result entire replication halt until queries are done.

The WAL must be applied in order, so its not like lets skip this entry and come back to it later (would be cool though)

Kyle Hailey goes into that in his blog too (they found out the hard way)

Now what do you do? Well you can send some feedback from standby to the primary (with hot_standby_feedback) saying hey primary this is standby X and my oldest snapshot is t7. This prevents vacuum from purging the tuples on primary because someone on the standby is querying them and as a result allows replication to proceed, normal non-blocking WALs are applied.

But guess what? now you bloated your primary as vacuum cannot purge and free up tuples further slowing reads and writes on primary putting you in the dilemma.

What is the lesser of two evil.

don’t you love postgres?

--

--

Hussein Nasser
Hussein Nasser

Written by Hussein Nasser

Software Engineer passionate about Backend Engineering, Get my backend course https://backend.win

Responses (3)