Logical replication in PostgreSQL
Postgres comes with two replication strategies.
- Logical replication: Only SQL commands are processed to create a replication.
- Physical replication: Bytes of data are sent from the master server for replication.
When it comes to physical replication the replica is forced to receive all the tables, rows, and tables spaces. But what if we want to replicate a single table? Logical replication comes into the picture here. With logical replication, we can consider certain subsets of a table and Postgres will replay the DML command on the replica, it does not execute the DDL commands like Indexes we can separately create indexes on the logical replication.
How does Logical replication work deep inside?
- Logical replication uses a publish and subscribe model. To receive the changes from the publication a subscriber can sign up by creating a subscription.
- Postgres initially gets the SQL statements by decoding the recodes from the WAL (Write Ahead logging) file.
- The SQL statements are streamed over the network to the standby server.
- The SQL statements are applied to the standby server in the correct transactional order.
Hands on implementation
Initially, we need to Set up the test environment.
- Instances of Two new postgres database clusters are to be created.
Create a new database cluster using the initdb command.
initdb -D /tmp/publication_db
A database cluster is a collection of databases managed by a single server. Here tmp/publication_db is the directory where the files will be stored.
Create another cluster subscription_db in a different terminal.
initdb -D /tmp/subscription_db
2. Configuration changes in the publisher settings. wal_level=logical
In the publication terminal access PostgreSQL.conf file using the nano tool
nano /tmp/publication_db/postgresql.conf
You can search in the nano tool using Ctrl+W
Change the wal_level=logical which is commented by default with replica value.
Also, change the port to some other in both subscriber and publisher. Here I have been using 5433 for publisher and 5434 for the publisher.
3. Start the instance.
pg_ctl -D /tmp/publication_db start
pg_ctl -D /tmp/subscription_db start
4. Create databases and tables.
Connect to the instances using psql client and create databases and tables.
Publisher:
psql --port=5433 postgres;
CREATE DATABASE pub;
Connect to DB \c pub
Create table CREATE TABLE table1(id int primary key,name varchar);
Insert values INSERT INTO table1 values(generate_series(1,10),’data’||generate_series(1,10));
Subscriber :
psql --port=5434 postgres;
CREATE DATABASE sub;
As logical replication does not support DDL commands we have to go through the same process that we did in publisher. But to make it easy we can use pg_dump
Open a diffrent terminal which is not connected to any insatance and create dump and pipe it to the subscriber instance; pg_dump -t table1 -s pub -p 5433| psql -s sub -p 5434;
After connecting to the sub datbase we can check that the table1 is reflected.
Perform logical replication
- Create a publication on the table1 in pub db;
create publication mypub for table table1;
2. Create a subscription on the created publication mypub.
create subscription mysub connection ‘dbname=pub host=localhost user=piyush port=5433’ publication mypub;
3. Test the replication.
SELECT * FROM table1;
in sub database. You can see the data is reflected here from the pub table1.
You can see the changes been replicated in the sub table1 as changes are been made on pub table1;
Where to use logical replication?
The typical use cases for logical replication are:
- Sending incremental changes in a single database or a subset of a database to subscribers as they occur.
- When data has to be retrieved for some other purpose like data analytics where the queries are completely different so you need a different set of indexes out there.
- Firing triggers for individual changes as they arrive on the subscriber.
- Consolidating multiple databases into a single one (for example for analytical purposes).
- Replicating between different major versions of PostgreSQL.
- Replicating between PostgreSQL instances on different platforms (for example Linux to Windows)
- Giving access to replicated data to different groups of users.
- Sharing a subset of the database between multiple databases.