PostgreSQL and SRE : Log Shipping Replication.

4 min readJul 1, 2018

Data is important part of your business and so is the database that stores it. Storing data in one place is a risky move. There is a very high chance of losing all the data in an event of a disaster. A way to improve on the situation is to keep synchronised data at multiple location.

PostgreSQL provides several different ways to archive and replicate the primary database for availability , fault-tolerance or load-balancing.

Terminologies used:

Database cluster: Database storage are on the disk. Database cluster is managed by a single instance of database server. Each database cluster usually contains at least two databases : postgres and template1

Primary server: Also called as the Master server. It is the active database and serves both read-write operations.

Secondary server: Also called as the Slave server. It is the replica of the primary database and can serve as a failover of the primary server.

Hot standby server: Slave server capable of serving read queries.

WAL: (Write ahead logs ) These are log records describing the changes that have been flushed to permanent storage

Data Directory : Directory under which all data is stored by default it should be /var/lib/postgresql.

Replication slot: Before 9.4, in the event of a standby going offline, a master kept set a number of WAL files until the standby comes back online. So, in order to know how many WAL files need to be kept around, the wal_keep_segments setting needed to be set just right.

Now, with 9.4 and onwards, the wal_keep_segments is no longer needed as one can use replication slots instead. The behavior is similar, but a replication slot ensures that all the WAL files are being kept forever on behalf of the standby.

Understanding Log Shipping Replication.

Log Shipping refers to directly shipping WAL logs from master to slave. It allows to implement replication irrespective of the physical location of the server and is constraint by bandwidth only. There are two types of log shipping methodology.

File based log shipping
Record base log shipping (Streaming Replication)

Log shipping is asynchronous in nature and logs are shipped after they have been written to disk and thus may increase the RTO. Using Streaming Replication data loss can be minimised.

When the admin starts a standby server the following steps are performed by standby.

Objective

Setting up a Streaming replication include these steps at high level.

Configure Primary database to allow standbys to connect.
Create a base backup using pg_basebackup of the database cluster
Add a recovery.conf file in the Data directory of the Secondary.
Turn on standby mode.

Before we begin.

Ensure that Primary and Secondary should be similar to each in terms of mount paths , tablespace , Hardware architecture.

Ensure that Primary and Secondary should be reachable to each other on the network.

Configurations on the Primary Server.

Create the replication user (repl): To configure the replication PostgreSQL requires a role with REPLICATION permission at the very least.

postgres@primary$ psql -c "CREATE USER repl REPLICATION LOGIN PASSWORD 'SuperSecretPassword';"

Crete replication slot: Replication slot for the standby server

select * from pg_create_physical_replication_slot('standby');

Edit pg_hba.conf: To enable the repl user’s authentication to Primary server.

# Allow replication connections
host     replication     repl         /32        md5

Edit postgresql.conf: Add entries to enable replication. set the wal level configuration parameter to replica

# In releases prior to 9.6, this parameter also allowed the values archive and hot_standby. These are still accepted but mapped to replicawal_level = replica# maximum number of concurrent connections from secondarymax_wal_senders = 3

Restart the primary server

root@primary# service postgresql restart

Backing up the primary server to the standby server.

Stop the secondary server.

root@secondary# service postgresql stop

Create empty database cluster. pg_basebackup doesn’t overwrite the existing files it is advisable to take the backup of the already existing data if you want to keep it. The following command drop the existing database cluster and creates a new one.

root@secondary# pg_dropcluster <version>  main
root@secondary# pg_createcluster --lc-collate=en_US.UTF-8 --lc-ctype=en_US.UTF-8 -d <data_directory> <version> main

Execute pg_basebackup from secondary database.

$ sudo -u postgres pg_basebackup -h <master-server-ip>  -D <data_directory> -U repl -S standby -R -v --xlog-method=stream

Wait for the basebackup to finish.

Configurations on the Secondary Server.

Edit postgresql.conf: Add following line

hot_standby = on # It is ignored on master.

Create a new configuration file named recovery.conf in the Data directory.

standby_mode = 'on'
primary_conninfo = 'user=repl password=SuperSecretPassword host=<primary-server-ip> port=5432 sslmode=prefer'
primary_slot_name = 'standby'
recovery_target_timeline = 'latest'

Start secondary server.

root@secondary# service postgresql stop

Checking the replication status

On Primary and Secondary the below command should give the same checkpoints:

ps aux | grep "sender/|receiver"

Troubleshooting:

Look at the Postgres log on each server.

Standby complains about missing logs : The master has deleted logs before the standby could process them
pg_xlog directory filled : Move few older WAL logs to some temporary directory. Restart the postgres again. It will fetch logs from Master.
Authentication problem. Check pg_hba.conf on Master .