Replicating Data from On-prem/Other Cloud Hosted Postgres database to CloudSQL postgres using Pglogical and perform role-reversal (Switchover/failover)

Arkapravo Banerjee
Google Cloud - Community
8 min readJul 15, 2024

This guide would help you to set up a Hybrid postgres replication using Pglogical which might be useful for many organizations who want to have a replication setup between their on-prem/Other Cloud hosted postgres databases and CloudSQL postgres database.

Use-cases :-

  1. Organizations who want to create a hybrid DR-setup between on-prem/other Cloud and GCP CloudSQL. And also during DR drills/Actual disaster you can perform a switchover/failover.
  2. If you want to replicate a part of your database (Some specific tables) and not the entire database to CloudSQL for some testing activities.

GCP Database Migration service is a great tool to replicate changes from your on-prem and other Cloud databases to CloudSQL postgres but Generally DMS is considered as a migration tool and would not allow you to stream changes from CloudSQL back to your on-prem environments as well as you can perform a easy switchover/failover operation during disaster scenarios.

Overview

Source — Self-managed postgres database (Host :- 10.0.0.2)

Target — CloudSQL postgres (Host :- 10.152.144.3)

Replication set-up

  1. Create a replication user.
  2. Network setup between Primary and replica.
  3. Configure parameters in Primary and Replica and enable pglogical.
  4. Create pglogical nodes in source and target databases.
  5. Create a replication set with the tables/schema that you need to replicate.
  6. Create a Schema only Dump from source and restore it in target.
  7. Create the pglogical subscription in the target database.
  8. Monitor the replication.

Switchover/Failover

  1. Check the replication lag.
  2. Stop/Delete the existing subscription in CloudSQL.
  3. Add a replication set if required.
  4. Create a subscription in self-managed postgres which is your new target.

Lets Start!

Create a Replication user

## Replication user in on-prem
CREATE USER replication_user WITH REPLICATION LOGIN PASSWORD '<password>';
postgres =\du replication_user
List of roles
Role name | Attributes | Member of
- - - - - - - - - + - - - - - - - - - - - - + - - - - - -
replication_user | Replication | {}

## Replication user creation in CloudSQL
CREATE USER replication_user WITH REPLICATION
IN ROLE cloudsqlsuperuser LOGIN PASSWORD '<password>';
postgres=> \du replication_user
List of roles
Role name | Attributes | Member of
- - - - - - - - - + - - - - - - -+ - - - - - - - - - - -
replication_user | Replication | {cloudsqlsuperuser}

2. Network Set-up between Primary and Replica

Connectivity from source to target as well as vice-versa should be allowed which would be needed during switchover/failover.

Source Connectivity : — If the source is on-prem/self-managed, insert the below line in the pg_hba.conf file and perform a restart of the postgres cluster. This would allow incoming connections from CloudSQL to on-prem source.

host all replication_user REPLICA_IP_ADDRESS/32 md5

Note :- If source is hosted in another cloud or managed service ,allow the CloudSQL IP in the source side firewall/security group settings.

Replica connectivity :- As the Replica is CloudSQL instance so make sure the source IP (10.0.0.2) is allowed in the GCP VPC firewall and if needed add it to CloudSQL authorized networks as well.

3. Configure parameters in Primary and Replica and enable pglogical

Firstly if your primary is a self-managed postgres instance you need to download the pglogical extension using the yum or apt-get command. You can also visit the official pglogical github page for more download information.

Debian :- 
sudo apt-get install postgresql-14-pglogical

RHEL :-
sudo yum install postgresql14-pglogical

This step is not required in the target cloudsql instance as pglogical is by default installed. We just need to enable it using the appropriate steps for CloudSQL instance.

Primary (Self-managed) :-

wal_level='logical'
shared_preload_libraries= <Existing value>,pglogical (So, basically append pglogical to the existing value of shared_preload_libraries.)
Max_worker_processes = <one per provider database and at least one per subscriber node. At least 10 is the standard for this parameter>
Max_replication_slots = <one per node on provider nodes, you can provide a default value of 10>
Max_wal_senders = <one per node on provider nodes, you can provide a default value of 10>
listen_address= '*'

These parameters needed to be modified in the “postgresql.conf” file for the on-prem/self managed postgres database. After changing the parameters perform a restart of the postgres cluster.

If your non-AlloyDB provider is Amazon RDS or Aurora, then the pglogical extension must be enabled and the required parameters adjusted through cluster parameter group adjustments. Within an existing or new cluster parameter group, set the mentioned parameters.

Target (CloudSQL) :-

In the target CloudSQL instance Max_worker_processes,Max_replication_slots and Max_wal_senders has a default value of 8,10 and 10 respectively and it grows linearly with the amount of memory in the CloudSQL instance. So, no change is required in these parameters for CloudSQL instances. If you still need to modify these parameters change them using CloudSQL flags.

Pglogical as mentioned is already installed in CloudSQL. We would just need to enable it using the below command.

cloudsql.logical_decoding=on
cloudsql.enable_pglogical=on

Once all the mentioned parameters are set you can use the below query in both source and target to check if the parameters are set correctly.

SELECT name, setting FROM pg_catalog.pg_settings WHERE name IN ('listen_addresses','wal_level','shared_preload_libraries','max_worker_processes','max_replication_slots','max_wal_senders', 'cloudsql.logical_decoding' , 'cloudsql.enable_pglogical') ORDER BY name;
name | setting
- - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
cloudsql.enable_pglogical | on
cloudsql.logical_decoding | on
listen_addresses | *
max_replication_slots | 10
max_wal_senders | 10
max_worker_processes | 10
shared_preload_libraries | google_insights,iamgroup_management,pg_stat_statements,pglogical,password_validation
wal_level | logical
(8 rows)

4. Create Pglogical extension and nodes

A pglogical node represents a physical PostgreSQL instance, and stores connection details for that instance. Both the primary and the replica instance must register themselves as nodes.

Note: A Pglogical node as well as the pglogical extension is a database specific resource so if you are replicating multiple databases from source to target you must create pglogical extension and pglogical nodes individually for each database.

create extension pglogical;
GRANT usage ON SCHEMA pglogical TO replication_user;

Run the below command for both the source and target

source-instance$ SELECT pglogical.create_node(
node_name := 'primary',
dsn := 'host=<primary-ip> port=5432 dbname=<database_name> user=replication_user password=secret'
);
dest-instance$ SELECT pglogical.create_node(
node_name := 'replica',
dsn := 'host=<replica-ip> port=5432 dbname=<database_name> user=replication_user password=secret'
);

5. Create a replication set

You need to create or use the default replication set to add your tables you need to replicate. Also you can add all the tables in a schema if you want to replicate all the tables.

- Add tables separately
SELECT pglogical.replication_set_add_table(set_name := 'default', relation := 'test_table_1', synchronize_data := TRUE);
- Add all the tables in the public schema
SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);
- Check which tables have been added to all replication sets:
SELECT * FROM pglogical.replication_set_table;

In this test I have created a sample table and inserted a few rows in the table in the source.

CREATE TABLE test_table_1 (col1 INT PRIMARY KEY);
INSERT INTO test_table_1 VALUES (1),(2),(3);

6. Take a schema only dump of the tables/database you want to replicate

pg_dump -h SERVER_IP_ADDRESS -U postgres - create - schema-only my_test_db > my_test_db.schema-only.sql

Note :- You can use the “-t” flag to take dump of specific tables

Import the dump file in your target CloudSQL database to create the schema structure for replication

psql -h <CloudSQL_IP> -U postgres < my_test_db.schema-only.sql

7. Create the pglogical subscription in the target database

SELECT pglogical.create_subscription(subscription_name := 'test_sub',provider_dsn := 'host=<Primary_IP> port=5432 dbname=<Database_name> user=replication_user password=<Secret>);
SELECT * FROM pglogical.show_subscription_status('test_sub');

If the status appears as “replicating”, then the setup is successful. Query the tables to ensure that data has been replicated. Insert and modify records on the primary instance and verify that they then appear on the replica instance.

On the primary, query the pg_replication_slots table to see the replication slot created by the subscription.

8. Monitor the replication

Target :- (Check if the initial records inserted in step 5 is replicated) :-

select * from test_table_1;
col1
- - -
1
2
3

Source :- (insert a few rows in source and check the status in target)

INSERT INTO test_table_1 VALUES (4),(5),(6);
INSERT 0 3
select * from test_table_1;
col1
- - -
1
2
3
4
5
6
(6 rows)

Check status in target

select * from test_table_1;
col1
- - -
1
2
3
4
5
6
(6 rows)

So, we can confirm that rows are getting replicated from source to target.

Switchover/Failover

Switchover/failover is performed to switch the roles between the provider and subscriber databases. When you perform a switchover or failover, the roles of the two databases, provider and subscriber, are reversed. The provider becomes the subscriber and the subscriber becomes the provider.

This switchover capability is important for operating system upgrades, PostgreSQL upgrades and failover is required when the primary becomes unavailable.

1. Check the replication lag

If the primary is still available you can run the below command to check the replications status.

SELECT application_name,
state,
sync_state,
client_addr,
client_hostname,
pg_wal_lsn_diff(pg_current_wal_lsn(),sent_lsn) AS sent_lag,
pg_wal_lsn_diff(sent_lsn,flush_lsn) AS receiving_lag,
pg_wal_lsn_diff(flush_lsn,replay_lsn) AS replay_lag,
pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn) AS total_lag,
now()-reply_time AS reply_delay
FROM pg_stat_replication
ORDER BY client_hostname;
application_name | state | sync_state | client_addr | client_hostname | sent_lag | receiving_lag | replay_lag | total_lag | reply_delay
- - - - - - - - - + - - - - - -+ - - - - - - + - - - - - - - + - - - - - - - - -+ - - - - - + - - - - - - - -+ - - - - - - + - - - - - -+ - - - - - - - -
test_sub | streaming | async | 10.152.144.6 | | 0 | 0 | 0 | 0 | 00:04:54.84365

All these metrics should be “0” to make sure there would not be any data loss during the switchover.

2. Stop/Delete the subscriber in the target CloudSQL database

Once you Stop/Delete the subscriber any writes made to the existing primary would not be replicated. So, kindly stop all your writes during planned switchover events and then delete the subscription.

SELECT pglogical.drop_subscription('test_sub');
alter_subscription_drop
- - - - - - - - - - - - - -
t
(1 row)

3. Create a new replication set in the CloudSQL

Create a new replication set or add your tables to the default replication set in your new primary which would be the CloudSQL instance

- Add all the tables in the public schema
SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);

4. Create a new subscription the self-managed postgres database

As the Self-managed/on-prem database would be your new target so we need to create a new subscription in the self-managed database. During a failover event or last primary being unavailable follow this step once the primary comes back up online.

Note :- Please set synchronize_data := false while creating the new subscription as this would make sure that the initial data transfer would not happen and only the CDC would continue.

SELECT pglogical.create_subscription (subscription_name := 'replica_sub',replication_sets := array['default'], synchronize_data := false, provider_dsn := 'host=10.152.144.3 port=5432 dbname=replication user=replication_user password=abcd1234');
create_subscription
- - - - - - - - - - -
2808267049
(1 row)
SELECT * FROM pglogical.show_subscription_status('replica_sub');
subscription_name | status | provider_node | provider_dsn | slot_name | replication_sets | forward_origins
- - - - - - - - - -+ - - - - - - -+ - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - -+ - - - - - - - - - + - - - - - - - - -
replica_sub | replicating | replica | host=10.152.144.3 port=5432 dbname=replication user=replication_user password=abcd1234 | pgl_replication_replica_replica_sub | {default} | {all}
(1 row)

5. Monitor the replication

Check whether the new subscription is working on the new provider database

SELECT application_name,
state,
sync_state,
client_addr,
client_hostname,
pg_wal_lsn_diff(pg_current_wal_lsn(),sent_lsn) AS sent_lag,
pg_wal_lsn_diff(sent_lsn,flush_lsn) AS receiving_lag,
pg_wal_lsn_diff(flush_lsn,replay_lsn) AS replay_lag,
pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn) AS total_lag,
now()-reply_time AS reply_delay
FROM pg_stat_replication
ORDER BY client_hostname;
application_name | state | sync_state | client_addr | client_hostname | sent_lag | receiving_lag | replay_lag | total_lag | reply_delay
- - - - - - - - - + - - - - - -+ - - - - - - + - - - - - - -+ - - - - - - - - -+ - - - - - + - - - - - - - -+ - - - - - - + - - - - - -+ - - - - - - - - -
replica_sub | streaming | async | 10.0.0.2 | | 0 | 0 | 0 | 0 | 00:00:02.590681

Some important points to consider :-

  1. Pglogical replication cannot replicate DDL changes. If you make any schema changes in your source database, kindly use “pglogical.replicate_ddl_command” to run the DDL which would replicate the DDL to the target.
  2. Tables with primary key are only replicated in pglogical. Tables without primary key only replicate INSERTS and DELETES. In certain scenarios you can use the “REPLICA IDENTITY” feature if a table does not have a primary key and still you wish to replicate it completely.
  3. Using the above process you can perform switchover/switchback as many times as you require but please make sure to set synchronize_data := false everytime you create a new subscription and make sure a replication set is present with all the required tables in the primary.
  4. Make sure that newly created tables are automatically or manually added to the replication set in the primary to make sure that those tables are also replicated.

--

--