Logical Replication: PubSub in PostgreSQL

Sogo Ogundowole

Published in

Hacktive Devs

5 min readAug 30, 2022

Replicating snapshots of databases in real-time

Introduction

A year ago, a crypto company had database failures which led to the loss of data related to crypto transactions for different tokens. The company had to halt all operations on its app because it wanted to run data recovery for its system. Unfortunately, they could only get data from 2-days before the failure. This caused a lot of back and forth between them and their customers (including myself) in resolving the discrepancies, especially for those who had made transactions after their last backup.

Database backup is a key feature necessary for every database infrastructure in an organization. It is not enough to backup, but it is also very important to have an infrastructure that supports (near) real-time backups. PostgresSQL being an advanced database in this age enables this, via its Replica feature.

When version 9.0 of PostgreSQL was launched, a Physical Replication was introduced. This feature allowed XLOG records to be transferred to a destination server where it was applied. This feature was cluster-based, hence a single database could not be replicated but the whole cluster.

In PostgreSQL, the history data are known as XLOG record(s) or WAL data

In PostgreSQL 10.0, another form of replication was released, Logical Replication. This allowed object-level replication i.e a more selective kind of replication.

According to Several9s, Logical Replication replicates changes of objects based on their replication identity, which is usually a primary key. It is different to physical replication, in which replication is based on blocks and byte-by-byte replication. Logical Replication does not need an exact binary copy at the destination server side, and we have the ability to write on the destination server, unlike Physical Replication. This feature originates from the pglogical module

Architecture

The logical replication in PostgresSQL works with a Publisher-Subscriber pattern. This is a software design pattern in which messages are sent from a “publisher” into a queue and the “subscriber” listens for events or messages in the queue and picks these messages (Kafka, RabbitMQ e.t.c use this pattern).

Publication

The publication is defined in the primary database, which is the publisher. This publication is based on changes that happen in the database, i.e INSERT, UPDATE and DELETE operations on row level. A primary key or a unique index on the table is used as the replication identity.

Subscription

The subscription is defined on the secondary database, which is the subscriber. It is important to note that on the subscriber database, the schema and tables in which replication should be done have to be created using (almost) the same DDL on the publisher schema and table. Sometimes the subscribe cluster could be a data warehouse cluster.

When the subscription is created, the Logical Replication happens by creating copies of snapshots of the data on the publisher database. It waits for the changes and sends them to the subscription node immediately after these changes.

The publisher and subscriber are basically live PostgreSQL databases with a version that supports Logical Replication.

Walsender and Apply Processes

According to PostgreSQL’s official documentation, Logical Replication is implemented by “walsender” and “apply” processes. The walsender process starts logical decoding of the WAL and loads the standard logical decoding plugin (pgoutput). The plugin transforms the changes read from WAL to the logical replication protocol and filters the data according to the publication specification.

The walsender acts as the worker responsible for sending all changes in real-time via the replication protocol which will now send the data needed to the “apply worker”.

The “apply worker” in turn, maps this data received to the appropriate tables in the subscriber in the correct transactional queue. This ensures that changes are applied in the order in which they happened.

It is important to note that, a database can act as both subscriber and publisher for different data pipelines. So this means a subscriber, database B, fetching data from database A, can also act as a publisher to database C.

Logical Replication PostgreSQL Commands

Create a new Publication

Create the corresponding Subscription

Add tables to the Publication

Refresh Subscriptions

Drop table for Publication

Check Existing Publications and Tables

Check Existing Subscriptions and Tables

Limitations of Logical Replication

Tables must have a primary key or unique index
Bi-directional Replication is not supported: The publisher can not act as a subscriber to its subscriber and vice versa
Does not replicate schema/DDL: Schemas and tables of interest need to be created on the subscriber as it is on the publisher. Using the DDL from the publisher’s schema/tables can ease this. Hence primary key, unique index, sequences or truncates all need to be recreated on the subscriber

Since Replication happens on the row level, the replication pipeline will break, if there’s a change in the publisher table(s) that was not manually initiated on the subscriber table(s). These changes include:

Deletion of columns,
Addition of columns
Change in the name of columns
Deletion of a table
Change of name for a table

Hence, whenever any of these changes are made, the updates must also be made to the subscriber and refreshed for the logical replication pipeline not to break.

Though there may be other automation architectures that can be put in place to mitigate its limitations. For instance, create a scheduled job to check schema discrepancies between your master and slave cluster which could in turn either notify your analytics team to take actions or auto update the slave cluster to be in sync.

Conclusion

Logical Replication’s near real-time sync makes it stand out as the pipeline continues to seamlessly transmit data between the publisher and subscriber. It is recommended more for systems that are more stable and do not require multiple changes across tables or columns, deletion and addition of same too.

If you find this article helpful, kindly give some claps and follow. Thanks!

Migrating data from Elasticsearch(Opensearch) to S3 via Logstash

Data migration is one of the core components of Data Engineering today. This is because most of the day-to-day…

medium.com

GSpread: Automate Google Sheet with Python

Google Sheets was released on the 9th of March 2006 and in 2009 Google Sheets API was launched for the first time…