Step By Step: Manually process failover with PostgreSQL and pgpool

Thor Chen
3 min readMay 20, 2018

Assumption 1: We are based on the database setup in https://medium.com/@zzdjk6/step-by-step-use-pgpool-to-achieve-load-balance-separation-of-read-write-on-mac-e1b8b21af159

Assumption 2: We are simulate the deployed situation, which means the machine runs pgpool can’t access the data folder of database machine.

  1. What if Slave db down? Let’s try:
pg_ctl --pgdata=./data/db2 stoppsql --dbname=postgres --host=localhost --port=9999 --username=postgres --command="SHOW POOL_NODES;"
# The "standby" node is shown as "down"
psql --dbname=postgres --host=localhost --port=9999 --username=postgres --command="select * from test_record;"
# The query is working correctly
psql --dbname=postgres --host=localhost --port=9999 --username=postgres --command="insert into test_record (text) values ('d')"
# Oh, the command wait forever!!!

After we hit Ctrl+C to interrupt the command, PostgreSQL tells us:

^CCancel request sentWARNING:  canceling wait for synchronous replication due to user requestDETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.INSERT 0 1

This is a downside of synchronous replica, see: https://stackoverflow.com/questions/42686097/how-to-disable-uncommited-reads-in-postgres-synchronous-replication

Note that a synchronous replication setup with only one standby, where the standby is down, is to be considered down or dysfunctional. That’s why you need at least two synchronous standby servers if you don’t want to reduce your availability.

So now we have to either set the the replica to be async or add another Slave replica. Here we choose to make another sync replica at 5435:

pg_basebackup --pgdata=./data/db3 --write-recovery-conf --slot="slot2" --host=localhost --port=5433 --username="postgres" --progresspg_ctl -D ./data/db3 -l ./log/db3 -o "-p 5435" start

Now the Master back to work correctly. We restart the db2, and now if we run show pool_nodes;, pgpool will still think the db2 is down… We need to pgpool stop and pgpool -D to restart it.

We can use pcp_attach_node to attach node, see: http://www.pgpool.net/mantisbt/view.php?id=259

Notice: If we want to use pcp_attach_node, we need to need to setup pcp.conf first, see: http://www.pgpool.net/docs/latest/en/html/configuring-pcp-conf.html

2. What if Master db down? Let’s try:

pg_ctl -D ./data/db1 stoppsql --dbname=postgres --host=localhost --port=9999 --username=postgres --command="select * from test_record;"# psql: FATAL:  failed to create a backend connection
# DETAIL: executing failover on backend
# Retry it will work
psql --dbname=postgres --host=localhost --port=9999 --username=postgres --command="select * from test_record;"
# It works
psql --dbname=postgres --host=localhost --port=9999 --username=postgres --command="insert into test_record (text) values ('f')"
# ERROR: cannot execute INSERT in a read-only transaction

If we restart the Master and pgpool, everything will back to work. But what if the Master keeps down? We can promote a Slave to be new Master, and make the old Master to be a Slave.

pg_ctl promote -D ./data/db2

Here we also make db2 use async replica for easier test (otherwise we will meet the sync replica problem claimed before).

# code ./data/db2/postgresql.confsynchronous_commit = offsynchronous_standby_names = ''

Edit pgpool.conf:

backend_hostname0 = 'localhost'backend_port0 = 5434backend_weight0 = 1backend_flag0 = 'ALWAYS_MASTER'backend_hostname1 = 'localhost'backend_port1 = 5433backend_weight1 = 1backend_flag1 = 'ALLOW_TO_FAILOVER'

Restart pgpool:

pgpool stop
pgpool -D

Now test:

psql --dbname=postgres --host=localhost --port=9999 --username=postgres --command="insert into test_record (text) values ('g')"psql --dbname=postgres --host=localhost --port=9999 --username=postgres --command="select * from test_record;"# The new Master works

Then we make db1 as a Slave:

code ./data/db1/recovery.conf
# Create and edit "recovery.conf"

In that file:

standby_mode = 'on'primary_conninfo = 'user=postgres host=localhost port=5434'primary_slot_name = 'slot1'recovery_target_timeline = 'latest' # This is important!!!

In db2, create new slot because previous slots don’t exist in this new Master:

SELECT * FROM pg_create_physical_replication_slot('slot1');
# Create a slot named "slot1"
# Notice: this is the new Master!!

Now start db1 and restart pgpool, it will become a Slave:

pg_ctl -D ./data/db1 -l ./log/db1 -o "-p 5433" startpgpool stop
pgpool -D

--

--

Thor Chen

Passionate JavaScript/TypeScript Developer with a Full-stack Background