How to replicate Postgres RDS functionality with a few scripts and a Cloudformation template

Max Kremer
12 min readMar 6, 2024

--

or How to build a Postgres rocket ship in your garage

Greetings tech world denizens. In my last post Why Postgres RDS didn’t work for us I discussed some of the gotchas with running Postgres on Amazon RDS. I work with an application that collects a ton of online user behavior. Every click, every field input, every action. Data is stored in postgres and end users, via our UI or raw SQL access, run models and reports that crunch the above (sometimes very large) data sets. Our application is of the multi-tenant SAAS variety, so data is being ingested and stored per tenant across hundreds of tenants. It needs to be scalable and performant and not cost an arm and a leg.

In this post, I’ll guide you through constructing your own high-performance PostgreSQL setup on EC2 based off of our production environment. We’ll replicate 80% of RDS functionality — crafting both single instance clusters and scalable read-write replicas — optimized for performance and cost. This method not only offers a deeper dive into the granular control of your database environment but also unveils the potential for significant savings and performance gains.

Let’s start with the basics, RDS is slick but pricey. However, it gives turn-key ability to spin up PostgreSQL DBs and not worry about infrastructure. It comes with charts, monitoring, performance graphs, click to backup, click to restore. It’s a complete DB as a service solution. But that comes at a cost, and I’m not just talking moola. RDS offers very poor performance (and I/O transparency) compared with what Postgres is actually capable of given the right hardware (read my previous post for details on the issues we faced with RDS). So if you want to run high-performance PostgreSQL databases that scale with predictable cost, or require more than the limited control RDS gives you (e.g. you want to deploy Postgres extensions of your choice) then read on.

What is Cloudformation

At its core, AWS CloudFormation is an automation and infrastructure-as-code service provided by Amazon Web Services (AWS). It allows users to define, deploy, and manage a collection of AWS resources (like EC2 instances, VPCs, EBS volumes, etc…) using simple JSON/YAML files or “templates”. These templates describe the desired state of your AWS “Stack” as Cloudformation calls it. This enables you to automate the provisioning and updating processes in a predictable and repeatable way. Think Terraform’s clumsy little brother (I have some beefs with Cloudformation but I’ll save those for another post).

Bottom line is that instead of manually setting up each component of your Postgres environment on EC2 — like configuring storage, network settings, security groups, and replication — CloudFormation lets you encapsulate all these steps within a template that groups a bunch of resources together into a “Stack”. In our case each stack will be a read/write Postgres primary with an optional read-only replica. Along with that we can automate the creation of the bells and whistles that are included with RDS such as metrics, alerts, access control and security, logging, replication and backups.

Inventory of moving parts

Let’s start by mapping out an inventory of moving parts.

AMI

Our AMI or base image is based off of the AWS Linux2 image with a few added yum installs:

sudo amazon-linux-extras install epel
sudo yum install -y kernel-develsudo yum install -y epel-release
sudo yum install https://zfsonlinux.org/epel/zfs-release-2-2.el7.noarch.rpm
sudo sed -i ‘s/\$releasever/7/g’ /etc/yum.repos.d/zfs.repo**
reboot
sudo yum install -y zfs
sudo /sbin/modprobe zfs

The above installs the ZFS filesystem and its dependencies. ZFS allows us to use compression, in essence trading a bit of CPU time for higher I/O. ZFS, along with HugePages which we’ll discuss later are the parts of this recipe that give your postgres instances an added performance boost. Spicey!

sudo amazon-linux-extras install postgresql14 
sudo yum install -y postgresql-server postgresql-contrib postgresql-server-devel.x86_64
sudo yum update

Of Course we’ll need to install PostgreSQL version 14 and some of its libs.

sudo yum install -y perl-Switch \
perl-DateTime \
perl-Sys-Syslog \
perl-LWP-Protocol-https \
perl-Digest-SHA.x86_64

curl https://aws-cloudwatch.s3.amazonaws.com/downloads/CloudWatchMonitoringScripts-1.2.2.zip -O
unzip CloudWatchMonitoringScripts-1.2.2.zip && \
rm CloudWatchMonitoringScripts-1.2.2.zip && \
cd aws-scripts-mon

Some handy Perl utils and the AWS Cloudwatch monitoring scripts. Remember we’re starting with an Amazon Linux2 image so it will come with the AWS CLI which makes accessing other AWS resources via command line a breeze.

wget https://github.com/wal-g/wal-g/releases/download/v2.0.1/wal-g-pg-ubuntu-18.04-amd64.tar.gz
tar -xvf wal-g-pg-ubuntu-18.04-amd64.tar.gz
mv wal-g-pg-ubuntu-18.04-amd64 wal-g
sudo chown root:root wal-g
sudo mv wal-g /usr/local/bin/
sudo yum install -y lzo
sudo yum install nvme-cli

Here we’re installing WAL-G and a utility for querying the metadata of local NVMe drives (more on that below in EC2 Instances). For EC2 instance types that use the Nitro Hypervisor, volumes that use the NVMe driver appear as nvmeXnX volumes. We need to read those names dynamically so we’ll call the nvme cli in a script below.

WAL archiving is performed by WAL-G via PostgreSQL’s archive_command, and streaming replication is supported by PostgreSQL’s protocol. These functions are distinct, but closely related. WAL-G handles backups and point-in-time-restore (PITR), while streaming replication supports the hot standby server. However, if the standby falls behind the primary and misses a WAL segment it can also retrieve it from S3 using WAL-G.

EC2 Instances

We are going to be working with 2 types of instances here and 3 sizes. This does not mean that you are limited to the instance types I use here, just that these are the ones I’ve found work well for our analytics workload.

For our read-write primary our Cloudformation script will be configured to use one of 3 instance types:

  • t3a.medium — general purpose instance, great for one or smaller tenants as well as QA workloads. We don’t pair this instance type with read replica but that doesn’t stop you from doing so
  • r5a.xlarge — Memory optimized instance so we can give postgres more memory
  • r5a.2xlarge — Same as above but more! For our enterprise tenants

Note: r5b instances are the newest memory optimized instances that increase EBS performance by 3x compared to same-sized R5 instances so you may want to explore those.

For our read replicas we use i3en instance type for their lightning fast NVMe drives.

  • i3en.xlarge — pairs with the r5a.xl master
  • i3en.2xlarge — pairs with the r5a.2xl master

But we can’t have our cake and eat it too… i3en instances come with up to 60TB of NVMe storage however the catch is the drives are ephemeral — Ephemeral storage is transient; data is lost if the instance is stopped, terminated, or if the underlying physical drive fails. This contrasts with EBS volumes, where data persists independently of the instance’s life. So the high I/O for our instance-attached NVMe drives comes with potential data volatility. We’re using this for our read replicas so it’s fine. If a drive fails or an instance terminates we can easily recreate it using data from the master.

Wal-G

WAL-G is an archival restoration tool for PostgreSQL. It’s going to push Postgres transaction log files (WALs) to S3 from the primary and from the replica it’s going to read those WALs from S3 to bring the replica up to date. It also functions as our backup utility for taking scheduled full backups via cron.

S3

Amazon’s simple storage service we all know and love. We are going to be using S3 for 3 key purposes:

  • Each instance type is going to pull its postgresql.conf file from an s3 path based on the instance type. For example for our primary r/w instance type the config file for r5a.2xlarge instance types will be in s3 under the following URL in my cf-pg-configs bucket: https://cf-pg-configs.s3.us-west-2.amazonaws.com/rw/r5a.2xlarge.conf
  • S3 will also be used by Wal-G to push/pull WAL files
  • WAL-G will also push nightly backups to s3 via shell script and cron job

Cloudwatch

Cloudwatch is Amazon’s monitoring service. Remember that CloudWatchMonitoringScripts utility we installed on the AMI? It will enable us to send metrics to CloudWatch so we can put them on dashboards and create alarms. We’ll be looking at the free space available on drives and setting alarms to alert us when they reach 80% capacity

Log forwarding agent

It’s very handy to be able to access Postgres cluster logs in one central place so you’re not SSHing into every server. Here we use Splunk which includes a small daemon that runs on each instance and forwards specified log data (I’ve omitted that installation from my base AMI image instructions in Github). You can use any number of tools to accomplish this, for example Elastic Stack (ELK Stack): Comprising Elasticsearch, Logstash, and Kibana, the ELK Stack is a popular open-source choice for searching, analyzing, and visualizing log data in real-time. Splunk is not free but I do love it.

Shell Scripts

We are going to be using 3 short scripts to wire things up. Some of the scripts are generated dynamically because the Cloudformation Stack Name needs to be part of the script (although there are other ways around this via environment variables). All three of the scripts listed below are embedded within the Cloudformation template so look for the content of these scripts there (see Github repo link at the end). The scripts are all so short as not to warrant being version-controlled assets, separate from the Cloudformation template. From a devops perspective I felt confident including them within.

push-backup.sh As the name suggests this script invokes the wal-g backup-push command to push a full backup to s3. We will be invoking this script via cron and at the time of postgres primary instance creation so that the replica has something to restore.

poll-pg_hba_update.sh The purpose of this script is to poll Cloudformation from the postgres primary (via AWS CLI) to get the internal IP of replica. It’s a shame that Cloudformation doesn’t provide any facility for this — there is no way to create resources and then use their details when creating other resources later in your template — also circular dependencies are not allowed. Nonetheless we need the IP of the replica on the primary because the pg_hba.conf on the primary needs a line to authenticate the replica like so:

host replication repuser [REPLICA_IP]/32 scram-sha-256

Now if you want to play fast and loose with security and rely on your VPC security groups to keep your instance safe then you don’t need to specify the replica’s ip and can eliminate this script. You can do this instead:

host replication repuser 0.0.0.0/0 scram-sha-256

zpool-nvme.sh When creating the replica instance using i3en instance types the NVMe drives aren’t predictably named, furthermore depending on the instance size we get a different number of drives. We need the names of the attached NVMe drives to create our ZFS file system which will span the drives. To do that we use the NVMe cli utility we installed on the AMI to query the NVMe drive names and create our ZFS file system.

We end with something like this. Splunk can be replaced with any other log monitoring solution

The read only is similar only that WALs and backups move in the opposite direction (from S3)

Putting it all together

The Cloudformation template file is a YAML or JSON text file that lets you define resources to create a Stack. Your template can contain parameters where the user of your template can define inputs via an AWS screen at stack launch time, in our case the inputs are the AMI (there’s only one but I decided to make it a parameter for flexibility), the KeyPair name for instance security, the AV Zone, the Primary (or RW) instance type, the replica (or RO) instance type, the IP you wish to SSH from and the DB password for the replica user. The stack name itself is another implicit and required parameter, it used as a namespace to when creating other things, references to the stack name can be found throughout the template. The Parameters section looks like this:

 "Parameters": {
"BaseImage": {
"Type": "String",
"Default": "ami-005973cfb88e3d3af",
"AllowedValues": [
"ami-005973cfb88e3d3af"
]
},
"KeyName": {
"Description": "",
"Type": "AWS::EC2::KeyPair::KeyName",
"ConstraintDescription": "The name of an existing EC2 KeyPair."
},
"AvailabilityZone": {
"Description": "",
"Type": "AWS::EC2::AvailabilityZone::Name"
},
"InstanceTypeRW": {
"Description": "PostgreSQL EC2 instance type",
"Type": "String",
"AllowedValues": [
"r5a.2xlarge",
"r5a.xlarge",
"t3a.medium"
],
"ConstraintDescription": "must be a valid EC2 instance type."
},
"InstanceTypeRO": {
"Description": "PostgreSQL EC2 instance type",
"Type": "String",
"Default": "not_set",
"AllowedValues": [
"not_set",
"i3en.2xlarge",
"i3en.xlarge"
],
"ConstraintDescription": "must be a valid EC2 instance type."
},
"SSHLocation": {
"Description": " The IP address range that can be used to SSH to the EC2 instances",
"Type": "String",
"MinLength": "9",
"MaxLength": "18",
"Default": "0.0.0.0/0",
"AllowedPattern": "(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})/(\\d{1,2})",
"ConstraintDescription": "must be a valid IP CIDR range of the form x.x.x.x/x."
},
"DBPassword": {
"NoEcho": "true",
"Description": "Password for database access",
"Type": "String",
"MinLength": "1",
"MaxLength": "41",
"AllowedPattern": "[a-zA-Z0–9]*",
"ConstraintDescription": "must contain only alphanumeric characters."
}
}

The template then goes on to create the resources, in our case these are:

InstanceProfile — To pass an IAM role to the EC2 Instances we create

SG — The VPC security group we are creating for this stack (you can use existing security groups if you want)

SGinSelfPG, SGinPG, SGinPG — a bunch of ingress rules for the above SG

PsqlServer — this is our PostgreSQL RW primary

PsqlServerPgTankAlarm — Our cloudwatch alarm for the ZFS zpool named “pgtank”. Will create an alarm when pgtank is over 80% full

PsqlServerRO — Our read-only replica (this optional, we can run a single r/w instance)

PsqlServerROPgTankAlarm — a similar cloudwatch alarm for the replica

disk1, disk2 — EBS volumes for the RW primary

PrimaryDNSRecord, ReadOnlyDNSRecord — DNS records in .local hosted zone. This is handy so that the application can know about the cluster without referring to IPs but is not strictly necessary

Rather than paste the entire Cloudformation template here I will provide a Github repo. Having to annotate the template line by line is painful so here is the gist of the gist of the recipe that spins up the Postgres cluster — you’ll find the details in the template:

PostgreSQL Read/Write Primary

  1. Create the zfs zpool called pgtank
  2. Set some details on the zpool (compression etc…)
  3. Do some housekeeping (directories, permissions, users)
  4. Run postgre initdb to initialize the database
  5. Update sysctl.conf to set the number of huge pages. Huge pages are a way for a process to make more efficient use of memory on your system. If your interested in the details of how this applies to postgres read this, but in a nutshell it allows postgres to work with larger chunks of memory
  6. Setup a bunch of Cloudformation scripts for Cloudformation hooks
  7. Write a pg_hba.conf file
  8. Create a postgresql.conf file by pulling the right one from S3 (I’ve already done all the tuning so that memory settings, etc match the instance type — they are in the Github repo)
  9. Create a postgresql.service file so that Postgres can be run by the systemd service daemon
  10. Create the push-backup.sh and poll-pg_hba_update.sh scripts I mentioned above
  11. Call psql and do an ALTER USER on postgres to set the password given in the params
  12. Run systemctl restart postgresql to restart the service
  13. Add the archive_command to postgresql.conf
  14. Add cron jobs for push-backup.sh and the cloudwatch monitoring scripts (to send metrics)
  15. Add cron job for poll-pg_hba_update.sh to begin polling for replica’s IP
  16. Restart and run the push-backup.sh script to push the 1st backup to s3 so it will be available to the replica

PostgreSQL Read-only Replica

  1. Run zpool-nvme.sh to query nvme drive names and create a zpool
  2. Do some housekeeping (directories, permissions, users)
  3. Perform a WAL-G backup-fetch to get the latest backup from s3 (pushed by primary via push-backup.sh)
  4. Create a postgresql.conf file by pulling the right one from S3
  5. Append the restore_command to postgresql.conf to begin restore
  6. touch /pgtank/db/postgres/standby.signal to let Postgres know this is a replica
  7. Add the details of the primary to postgresql.conf by appending primary_conninfo
  8. Update sysctl.conf to set the number of huge pages (just like the primary)
  9. Setup a bunch of Cloudformation scripts for Cloudformation hooks
  10. Create a postgresql.service file so that Postgres can be run by the systemd service daemon
  11. Run systemctl restart postgresql to restart

Conclusion

And there you have it, folks — your very own Postgres rocket ship, built from scratch in the comfort of your digital garage. You’ve just strapped a turbocharger onto Postgres, giving it the freedom to tear through data on the open road of EC2, unshackled from the terrestrial limitations of RDS.

What’s next on this wild ride? Well, you’re now (or at least after you’ve read the READMEs in the repo below) the proud pilot of a setup that’s not just a cost-effective alternative to RDS but a performance beast ready to scale to the moon and back.

Remember, with great power comes great responsibility. Keep an eye on those CloudWatch alarms, and don’t let your backups gather dust.

So, go ahead, give it a spin, and let me know how it goes. Did your Postgres setup fly high, or did it need a few tweaks to get off the ground? Share your tales of triumph or troubleshooting in the comments. I’m all ears and always up for a good cloud adventure story.

Github Repo: https://github.com/maxkremer/cloudformation_pg_cluster

--

--

Max Kremer

I’m programmer and software entrepreneur, currently working as CTO for Lassoo.io