Achieving Seamless PostgreSQL Upgrades from 10 to 12 on AWS: Lessons from Tata 1mg

Pankaj Pandey
Tata 1mg Technology
7 min readOct 16, 2023

Co-authors:

  1. Swati Grover ( Associate Technical Architect @ Tata 1mg)
  2. Prashant Mishra (Technical Architect @ Tata 1mg)

*Introduction

At Tata 1mg, we recently undertook the challenge of upgrading our PostgreSQL databases from version 10 to 12 on AWS. Through a Proof of Concept (POC) approach and meticulous planning, we encountered and resolved a host of challenges that we believe can benefit others in similar situations. In this article, we’ll detail the critical steps we took to ensure a seamless upgrade process, outlining our journey from preparation to post-upgrade tasks.

DB Upgrade Overall

*Problem Statement

Upgrading a critical component like a PostgreSQL database is a fundamental requirement for maintaining optimal performance, security, and accessing new features. However, the process of upgrading a database system introduces challenges, particularly when it comes to minimizing or eliminating downtime, which can significantly impact the availability and functionality of applications that rely on the database.

Traditional upgrade methods often involve taking the database offline during the upgrade process, resulting in service disruption, dissatisfied users, and potential revenue loss for businesses.

The overarching problem is to devise a strategy that allows for the seamless upgrade of PostgreSQL databases while minimizing or eliminating downtime, ensuring that applications can continue serving users uninterrupted.

*Pre-Requisites: Setting the Stage for Success

Before delving into the upgrade process, we established several essential prerequisites:

Create a Parameter Group

  • Create a parameter group named production-postgres-12.

Create and Promote Read Replica

  • Create a read replica with the same size as the master database.
  • Follow the naming convention: sourcesource-db (e.g., chronos-productionchronos-production-db).
  • Promote the read replica to master once it’s ready and available, Now this will be our target RDS.

Storage Considerations Source RDS

  • Make sure there’s ample available storage in the source RDS. Expand the size of the source RDS by adding a buffer of 30 to 50% of its current size. This precautionary step will prevent storage problems in the source RDS during the upgrade process.
  • Since the source RDS will be deleted after the upgrade, any cost increase due to the size expansion will be short-term and won’t have lasting cost implications.

Target RDS Settings

  • Apply newly created parameter group(production-postgres-12) to this RDS.
  • Apply various security and critical settings which were applied on source RDS.

Unsupported Usage checks on Target RDS.

Commit or roll back all open prepared transactions before attempting an upgrade.

SELECT count(*) FROM pg_catalog.pg_prepared_xacts;

Verify “unknown” data type does not exist in the Database schema

SELECT DISTINCT data_type FROM information_schema.columns WHERE data_type ILIKE 'unknown';

Truncate on Target RDS.

Be careful and doubly check that you are executing this command on target(new) RDS only.

7a) target-rds-master=> CREATE OR REPLACE FUNCTION truncate_tables(username IN VARCHAR) RETURNS void AS $$
DECLARE
statements CURSOR FOR
SELECT tablename FROM pg_tables
WHERE tableowner='username' AND schemaname='public';
BEGIN
FOR stmt IN statements LOOP
EXECUTE 'TRUNCATE TABLE ' || quote_ident(stmt.tablename) || ' CASCADE;';
END LOOP;
END;
$$ LANGUAGE plpgsql;
-------------------------------------
7b) target-rds-master=> SELECT truncate_tables('username'); // it should be particular DB user name

Access Control and Enable Logical Replication on Source RDS

Empower the master user with SELECT privileges across all tables in the public schema.

source-rds-master=>grant SELECT on all tables in schema public to dev;

Check for logical replication slot

source-rds-master=> select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
(0 rows)

Create a publication for all tables or selective table in the source database. See the following code

source-rds-master=> CREATE PUBLICATION alltables FOR ALL TABLES;
CREATE PUBLICATION

OR
source-rds-master=> CREATE PUBLICATION <pub_name> FOR table <tablename>;
CREATE PUBLICATION

source-rds-master=> select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
(0 rows)

SELECT * FROM pg_publication; // verify is publication created

Enable Subscription on Target RDS

Create a subscription on the target database. See the following code

target-rds-master=> CREATE SUBSCRIPTION sub CONNECTION 'dbname=<db to be replicated> host=<source-rds> user=<master-user> password=pasxxxxxx' PUBLICATION alltables;

eg -
patient_service_db=> CREATE SUBSCRIPTION sub CONNECTION 'dbname=patient_service_db host=patient.cwpw16dip7iz.ap-south-1.rds.amazonaws.com user=master password=pasxxxx' PUBLICATION alltables;
NOTICE: created replication slot "sub" on publisher
CREATE SUBSCRIPTION

Verify that subscription slots with temporary flag true are created

patient_service_db=> select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
----------------------+----------+-----------+--------+--------------------+-----------+--------+------------+------+--------------+--------------+---------------------
sub | pgoutput | logical | 16401 | patient_service_db | f | t | 25397 | | 57094487 | 456/254A52F0 | 456/254DE170
sub_50777_sync_16906 | pgoutput | logical | 16401 | patient_service_db | t | t | 26081 | | 57083701 | 455/DD8BEAE0 | 455/DD8BEB18
sub_50777_sync_16870 | pgoutput | logical | 16401 | patient_service_db | t | t | 25403 | | 57083682 | 455/DD5748F8 | 455/DD574930
(3 rows)

Enforcing Cohesive Replication Slots for Dependent Multi-Table Replication

In case of multiple tables replication ensure dependent tables are part of same slot.

Fine tuning

  • Fine-tune source RDS storage and IOPS for smooth replication.
  • Fine-tune Target RDS storage type, instance type, IOPS settings for faster replication. This step is very crucial if we see below message under “logs and events”
Log messages

*Executing the Upgrade: Guided Steps to Success

Systematically perform the upgrade process when replication slots are in a synchronized state (temporary flag in the replication slot is set to false). This activity is carried out when both the source RDS and target RDS are synchronized, meaning the latest changes from the source RDS have been successfully replicated to the target RDS.

Achieving this synchronization might take a variable amount of time, typically spanning from a few hours to several days, contingent upon the size of the databases. Therefore, ensure that you initiate the following activity only when both RDS instances are in this synchronized state:

  • Validate schema and index consistency on both RDS 10 and RDS 12.
  • Confirm data counts and parity between RDS 10 and RDS 12.

Source RDS activities

  1. With master user perform below commands
  REVOKE CONNECT ON DATABASE $DB_NAME from PUBLIC;

REVOKE CONNECT ON DATABASE $DB_NAME from $DB_USER_NAME;

Wait for few seconds generally ~5 seconds;

2. With DB_USER perform below commands

select pg_terminate_backend(pid) from pg_stat_activity where datname = '$DB_NAME' and usename = '$DB_USER_NAME' and pid <> pg_backend_pid();
select pid, query from pg_stat_activity where usename = '$DB_USER_NAME'; # should give 0 rows

3. We need to synchronize sequences between Source and target. Execute below command on Source RDS

select 'SELECT SETVAL(' ||quote_literal(sequence_name) || ',' ||nextval(sequence_name) || ');' from information_schema.sequences; 

##Copy full output from above command and keep it safe in notepad
  • Make sure that the terminal connection established with the source RDS remains open until the activity is finished. If the connection is accidentally closed, you will need to reconnect using the master user, grant connect access to the DB_USER, and then restart from step 1.

Target RDS Activities

  • Connect with DB_USER and execute sequences command saved in notepad.
  • Grant necessary permissions on the this RDS to fortify the environment.

*Post-Upgrade Tasks: Ensuring Stability and Excellence

After a successful upgrade, address the following:

1.Database Validation (within approximately 10 seconds) through parallel execution of activities :

  • Validate data counts and parity between RDS v10 and RDS v12 by executing pre-written queries.
  • Leverage CloudWatch to monitor RDS performance.
  • Ensure data integrity by executing sample queries.

2. DNS switch and Downtime

  • Adjust the DNS to point to new RDS and monitor application connectivity.
  • The entire process, including blocking new connections on the source RDS, updating sequences on the target RDS, performing Database Validation, and switching DNS, entails an estimated downtime of around 30 seconds.

2. Application Validation:

  • Confirm application connectivity with the upgraded database.
  • Rigorously test end-to-end service functionality.

*Rollback Strategy: Navigating Unexpected Terrain

GRANT CONNECT ON DATABASE $DB_NAME TO PUBLIC on Source RDS;

GRANT CONNECT ON DATABASE $DB_NAME to $DB_USER_NAME on Source RDS;

DNS Switch back from target to source RDS.

Migrating delta records from target to source RDS if any.

Delete Replication slots on both Source and Target RDS

source_rds_master=> DROP PUBLICATION alltables;  // You can check publication name using SELECT * FROM pg_publication;
target_rds_master=> DROP SUBSCRIPTION sub; // You can check subscription name using SELECT * FROM pg_subscription;

*Conclusion: Elevating Expertise in Database Management

Mastering PostgreSQL database upgrades is an intricate dance of planning, precision, and execution. This guide empowers adept database experts to navigate the landscape with confidence. Armed with precise steps and code snippets, you’re ready to embrace database evolution while safeguarding data integrity and application. Elevate your expertise and embark on a journey of seamless upgrades, ensuring a harmonious integration of technology and proficiency.

*References:

--

--

Pankaj Pandey
Tata 1mg Technology

Director of Engineering / Senior Architect at Tata 1mg