PostgreSQL Streaming Replication

Umut Tekin
Analytics Vidhya
Published in
5 min readDec 23, 2021

--

In this topic I will give a quick overview on PostgreSQL streaming replication.

I used PotgreSQL version 14 for primary and standby systems.

For source system I used PostgreSQL latest docker image. For target system I used Debian latest docker image.

In order to prepare the source system;

postgres=# show data_directory;
data_directory
- - - - - - - - - - - - -
/var/lib/postgresql/data
(1 row)

We need to add following lines to at the end of “postgresql.conf” file:

wal_level = replica
wal_log_hints = on
archive_mode = on # (change requires restart)
archive_command = 'test ! -f /var/lib/postgresql/pg_log_archive/main/%f && cp %p /var/lib/postgresql/pg_log_archive/main/%f'
max_wal_senders = 10
hot_standby = on

Then,

mkdir -p /var/lib/postgresql/pg_log_archive/main/
createuser -e - replication repl -W # as postgres

Also, we need to add following lines to allow replication user “repl” to connect source database:

host replication repl 172.17.0.5/32 md5

Then restart the PostgreSQL cluster:

SELECT pg_create_physical_replication_slot('air');
pg_create_physical_replication_slot
- - - - - - - - - - - - - - - - - - -
(air,)
(1 row)
select * from pg_replication_slots;

We are using “pg_replication_slots” because of management of wal files. Before PostgreSQL version 9.4 we were managing wal files by “wal_keep_segments” parameter. When there is a problem on the communication between primary and standby node(s) if our system exceed limit “wal_keep_segments” PostgreSQL deletes files unless we create a replication slot. That is why we use replication slot on our configuration. Of course, there are some drawbacks like we need sufficient(much/ depends on the system) data storage in order not to cease production system in case any communication failure.

https://www.postgresql.org/docs/9.4/catalog-pg-replication-slots.html

https://hevodata.com/learn/postgresql-replication-slots/

On the target side configuration;

First of all, we need to install PostgreSQL binaries:

# Create the file repository configuration:
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
# Import the repository signing key:
wget - quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
# Update the package lists:
sudo apt-get update
# Install the latest version of PostgreSQL.
# If you want a specific version, use 'postgresql-12' or similar instead of 'postgresql':
sudo apt-get -y install postgresql

Then, I check if the replication user can connect to source database:

psql -h 172.17.0.4 -U repl postgres
Password for user repl:
psql (14.1 (Debian 14.1–1.pgdg110+1), server 14.0 (Debian 14.0–1.pgdg110+1))
Type "help" for help.
postgres=>
\q

We create data directory:

mkdir -p /var/lib/postgresql/data/
chmod 700 /var/lib/postgresql/data

We start a basebackup to initial load:

pg_basebackup -h 172.17.0.4 -D ./ -U repl -w -P -R -S air -X stream -W

Now, we can configure recovery parameters but there is no “recovery.conf” file after PostgreSQL version 12. It is merged “postgresql.conf” file. In order to put cluster to standby mode we need to create an empty file called “standby.signal”.

Only the name of “trigger_file” parameter has been changed, now it is called as “promote_trigger_file” and “standby_mode” has been removed. The rest of them stay same as how they were.

Before PostgreSQL version 12(recovery.conf):

restore_command = 'cp /var/lib/postgresql/pg_log_archive/replica1/%f %p'
recovery_target_timeline = 'latest'
primary_conninfo = 'user=repl host=172.17.0.2 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'
archive_cleanup_command = 'pg_archivecleanup /var/lib/postgresql/pg_log_archive/replica1 %r'
trigger_file=/tmp/trigger

After PostgreSQL version 12(postgresql.conf):

restore_command = 'cp /var/lib/postgresql/pg_log_archive/replica1/%f %p'
recovery_target_timeline = 'latest'
primary_conninfo = 'user=repl host=172.17.0.4 port=5432 password=123123 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'
archive_cleanup_command = 'pg_archivecleanup /var/lib/postgresql/pg_log_archive/replica1 %r'
promote_trigger_file='/tmp/promote'
primary_slot_name='air'

Create archive directory for standby node and the file that will put database in standby mode.

mkdir -p /var/lib/postgresql/pg_log_archive/replica1
touch /var/lib/postgresql/data/standby.signal

Finally, we can start stanby cluster:

/usr/lib/postgresql/14/bin/pg_ctl -D /var/lib/postgresql/data/ start

On the source system:

On the target system system:

At any time on the target system we can create “/tmp/promote” file to promote it as primary system:

touch /tmp/promote

Now, I am able to create new database on “old” standby system:

At the same time, I created another database on the primary system:

Oh no! We broke our replication😊. Is it possible to continue to replication from last successful checkpoint and keep up with primary system? Yes, we have “pg_rewind” like a wizard😊.

First, on the target system we need to stop PostgreSQL cluster:

/usr/lib/postgresql/14/bin/pg_ctl -D /var/lib/postgresql/data/ stop

Then, we can continue to equalize source and target data directories by executing:

/usr/lib/postgresql/14/bin/pg_rewind — target-pgdata=/var/lib/postgresql/data/ — source-server=”host=172.17.0.4 port=5432 user=postgres password=mysecretpassword dbname=postgres”

Yet, configuration on the target system is same with the primary system. So, we need to configure our “postgresql.conf” file and create and empty “standby.signal”. We need to add following lines to at the end of “postgresql.conf” file:

restore_command = 'cp /var/lib/postgresql/pg_log_archive/replica1/%f %p'
recovery_target_timeline = 'latest'
primary_conninfo = 'user=repl host=172.17.0.4 port=5432 password=123123 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'
archive_cleanup_command = 'pg_archivecleanup /var/lib/postgresql/pg_log_archive/replica1 %r'
promote_trigger_file='/tmp/promote'
primary_slot_name='air'

and

cd /var/lib/postgresql/data/
touch standby.signal

Let’ s start standy system one more time and try to create database on it:

/usr/lib/postgresql/14/bin/pg_ctl -D /var/lib/postgresql/data/ start
create database e;

At last, we can check status of the replication.

On the primary system:

On the standby system:

If there is something wrong please let me know!

Thanks!

--

--