Aurora PostgreSQL Upgrade

Igor Colombi
Growens Innovation Blog
6 min readOct 22, 2022

Here at MailUp, we use a bunch of Aurora PostgreSQL to store statistics (open & clicks, notifications and other data) for our customer’s platforms.

We started using Aurora PostgreSQL back in 2018. At that time version 10.x of PostgreSQL was available and since then only minor upgrades were applied to instances.

That means in 2022 we were still using PostgreSQL 10.x.

In the meantime:

  • three major releases of PostgreSQL are available with many improvements
  • AWS released a new type of instance based on Graviton 2 processors which offers a better cost/performance ratio compared to their x86 counterparts.

PostgreSQL 10 will reach his EOL on Nov 2022

(PostgreSQL: Versioning Policy ) and only PostgreSQL >= 12.3 where supported on Graviton instances (https://aws.amazon.com/blogs/database/key-considerations-in-moving-to-graviton2-for-amazon-rds-and-amazon-aurora-databases/)

It was time to upgrade.

Scenario

We had 10 Aurora PostgreSQL 10.18 instances of type db.r5.2xlarge to upgrade. That means every instance hosts about 10% of our customers. Every customers has it’s on schema.

At the time of migration the latest available version of PostgreSQL Aurora was 13.7, on a Graviton instance db.r6g.2xlarge

Our databases are write intensive. Every customer’s schema ranges from a few Mbytes to 300 Gbytes.

Upgrade method choice

In-place upgrade

Ref: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_UpgradeDBInstance.PostgreSQL.html#USER_UpgradeDBInstance.Upgrading.Manual

A minor release upgrade usually requires a few minutes because no data changes are involved with PostgreSQL. It’s just a matter of bring down the database service, replace the binaries and start again PostgreSQL.

But here we are talking about a major upgrade. This requires all data files to be updated to the new PostgreSQL version.

With our data size (from 1 to 3 TB per instance) we tested many in-place upgrades starting from snapshots and usually the upgrade process took about 2 hours.

During that time the instance is completely unavailable.

That translates to 1/10 of our customers unable to use their platform for about 2 hours.

That was unacceptable.

AWS Database Migration Service

Ref: https://aws.amazon.com/dms/

Provided by AWS itself, on paper it was the optimal solution since it helps to migrate databases while the source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.

It continuously replicate data from any supported source to any supported target.

Eventually you have two identical databases fully aligned and you can point your application to the new / destination database with the new PostgreSQL version without any downtime.

Unfortunately it has some limitations:

  • it does not support all data types and it does not manage secondary indexes or non-primaries keys. So you are not sure that all data will be migrated
  • it took about 10–20 hours to replicate the whole data for one instance. During that time the source instance works with an increased load that may affect performance for our customers
  • last but not least we were never able to finish a full replication because of some obscure errors that may appear at random phases of the synchronization.

After many hours of try and fail and despite the help of a consultant we gave up with that solution.

Finally, from AWS docs:

When you migrate from a database engine other than PostgreSQL to a PostgreSQL database, AWS DMS is almost always the best migration tool to use. But when you are migrating from a PostgreSQL database to a PostgreSQL database, PostgreSQL tools can be more effective.

Manual migration with PostgreSQL tools

Eventually we had to go trough a solution based on PostgreSQL tools: pg_dump and pg_restore.

These utilities create a consistent backup even if the database is in use and don’t block other users accessing the database (readers or writers).

They can be used to dump and restore a whole database or single tables or schemas.

Of course it’s extremely slow to dump and restore 3 TB of data and during the restore time we have to block all writes to the source database, keeping in mind that our databases are write intensive.

So we ended up to migrate schema by schema.

That’s the most expensive, in terms of effort, solution but also the safer and provides full control.

It requires to create a new Aurora PostgreSQL instance with the new PostgreSQL version, create the database and users/permission and start migrating schema by schema, migrating the end-point of our customer platform to minimize the downtime.

The migration

Database creation

With Ansible, that we already used to configure the source Aurora instances, we configured in a few minutes the new instances with PostgreSQL 13.7 and Graviton 2 type equivalent to x86: db.r5.2xlarge > db.r6g.2xlarge.

The configuration mainly consists of the database creation and user/role creation and permission assignment.

Data migration

We created and run a bash script within an ec2 instance in the same availability zone as the source and destination instances are located.

For what it’s worth we used Ubuntu as distribution, and we installed PostgreSQL tools with the same version of the destination server

(ref. PostgreSQL: Linux downloads (Ubuntu) ). In our case we installed PostgreSQL 13 client tools because we migrated from PostgreSQL 10 to 13.

We created a dedicated EBS volume to store the dump because we have single schemas that use more that 300 Gbytes of data.

To dump and restore the data we used these options:

Schema dump

pg_dump -j 4 -Fd --quote-all-identifiers --compress=0 -f $SCHEMA --blobs

Schema restore

pg_restore -j 4 -d $DATABASE $SCHEMA

The script takes care to:

  • collect some data about the customer starting from his ID: schema name, source database, endpoint
  • rename the source schema after the dump to prevent accidental data to be written to the old instance
  • clear our internal cache (Redis keys) that provides our applications with information about the location of customer database (endpoint)
  • restore the dump to the new instance
  • update the endpoint to refer to the new database

In the first day we migrated our DEV platforms, then we migrated our QA platform.

Then we start with our customers platforms, only 50 on the first day and increasing the number day by day.

Once we acquired confidence we started to migrate two different instances in parallel and without any limit on the number of the customers per day. This process took about 2 weeks.

Results

Space utilization

That was an unexpected surprise: starting from scratch we reduced the space utilization from 50% to 75%.

That’s mainly because of data fragmentation for these instances created 4 years ago and, secondly, due to the smaller index size provided by PostgreSQL 13 that includes significant improvements to its indexing and lookup system that benefit large databases, including space savings and performance gains for indexes and more.

For the same reasons (smaller data size, no fragmentation, PostgreSQL 13 improvements) our IO have been reduced by ~40%.

Data size. Blue: PostgreSQL 13; Orange: PostgreSQL 10

CPU Utilization

We reduced the CPU utilization by about 25% on average, and 50% on peaks. Again, thanks to PostgreSQL 13 improvements and to Graviton 2 instances:

PostgreSQL 10
PostgreSQL 13

Costs optimization

We started the migration on July 14 and ended it on July 28.

Aurora Storage Costs (daily)

At the end of the migration we more than halved the Storage Usage and nearly halved the Storage IO and backup storage.

On these three metrics, we saved about 37% of the variable costs every day.

The migration from instances type db.r5.2xlarge to db.r6g.2xlarge leads to a saving of 12% in instance reservations costs.

Due to the general resource usage reduction, especially on the CPU front, we will probably reduce the number of instances from 10 to 8 without any impact on performance using the same migration tools used for the migration, leading to additional savings in the future.

--

--