Step By Step: Manually process failover with PostgreSQL and pgpool
Assumption 1: We are based on the database setup in https://medium.com/@zzdjk6/step-by-step-use-pgpool-to-achieve-load-balance-separation-of-read-write-on-mac-e1b8b21af159
Assumption 2: We are simulate the deployed situation, which means the machine runs
pgpool
can’t access the data folder of database machine.
- What if
Slave
db down? Let’s try:
pg_ctl --pgdata=./data/db2 stoppsql --dbname=postgres --host=localhost --port=9999 --username=postgres --command="SHOW POOL_NODES;"
# The "standby" node is shown as "down"psql --dbname=postgres --host=localhost --port=9999 --username=postgres --command="select * from test_record;"
# The query is working correctlypsql --dbname=postgres --host=localhost --port=9999 --username=postgres --command="insert into test_record (text) values ('d')"
# Oh, the command wait forever!!!
After we hit Ctrl+C
to interrupt the command, PostgreSQL tells us:
^CCancel request sentWARNING: canceling wait for synchronous replication due to user requestDETAIL: The transaction has already committed locally, but might not have been replicated to the standby.INSERT 0 1
This is a downside of synchronous replica, see: https://stackoverflow.com/questions/42686097/how-to-disable-uncommited-reads-in-postgres-synchronous-replication
Note that a synchronous replication setup with only one standby, where the standby is down, is to be considered down or dysfunctional. That’s why you need at least two synchronous standby servers if you don’t want to reduce your availability.
So now we have to either set the the replica to be async or add another Slave
replica. Here we choose to make another sync replica at 5435
:
pg_basebackup --pgdata=./data/db3 --write-recovery-conf --slot="slot2" --host=localhost --port=5433 --username="postgres" --progresspg_ctl -D ./data/db3 -l ./log/db3 -o "-p 5435" start
Now the Master
back to work correctly. We restart the db2
, and now if we run show pool_nodes;
, pgpool
will still think the db2
is down… We need to pgpool stop
and pgpool -D
to restart it.
We can use pcp_attach_node to attach node, see: http://www.pgpool.net/mantisbt/view.php?id=259
Notice: If we want to use
pcp_attach_node
, we need to need to setuppcp.conf
first, see: http://www.pgpool.net/docs/latest/en/html/configuring-pcp-conf.html
2. What if Master
db down? Let’s try:
pg_ctl -D ./data/db1 stoppsql --dbname=postgres --host=localhost --port=9999 --username=postgres --command="select * from test_record;"# psql: FATAL: failed to create a backend connection
# DETAIL: executing failover on backend
# Retry it will workpsql --dbname=postgres --host=localhost --port=9999 --username=postgres --command="select * from test_record;"
# It workspsql --dbname=postgres --host=localhost --port=9999 --username=postgres --command="insert into test_record (text) values ('f')"
# ERROR: cannot execute INSERT in a read-only transaction
If we restart the Master
and pgpool
, everything will back to work. But what if the Master
keeps down? We can promote
a Slave
to be new Master
, and make the old Master
to be a Slave
.
pg_ctl promote -D ./data/db2
Here we also make db2
use async
replica for easier test (otherwise we will meet the sync replica
problem claimed before).
# code ./data/db2/postgresql.confsynchronous_commit = offsynchronous_standby_names = ''
Edit pgpool.conf
:
backend_hostname0 = 'localhost'backend_port0 = 5434backend_weight0 = 1backend_flag0 = 'ALWAYS_MASTER'backend_hostname1 = 'localhost'backend_port1 = 5433backend_weight1 = 1backend_flag1 = 'ALLOW_TO_FAILOVER'
Restart pgpool
:
pgpool stop
pgpool -D
Now test:
psql --dbname=postgres --host=localhost --port=9999 --username=postgres --command="insert into test_record (text) values ('g')"psql --dbname=postgres --host=localhost --port=9999 --username=postgres --command="select * from test_record;"# The new Master works
Then we make db1
as a Slave
:
code ./data/db1/recovery.conf
# Create and edit "recovery.conf"
In that file:
standby_mode = 'on'primary_conninfo = 'user=postgres host=localhost port=5434'primary_slot_name = 'slot1'recovery_target_timeline = 'latest' # This is important!!!
In db2
, create new slot because previous slots don’t exist in this new Master
:
SELECT * FROM pg_create_physical_replication_slot('slot1');
# Create a slot named "slot1"
# Notice: this is the new Master!!
Now start db1
and restart pgpool
, it will become a Slave
:
pg_ctl -D ./data/db1 -l ./log/db1 -o "-p 5433" startpgpool stop
pgpool -D