The biggest reason to run your database out of EC2 would be cost savings. As of writing this article, the cheapest/smallest RDS deployment costs around $30.88 USD/month. In contrast, we can run a t2.micro EC2 instance with 8GB of storage for around $6 USD/month. This cost can be further reduced if you opt for reserved billing.
Caveats / When You Should Opt for RDS
As a rule of thumb, if budget allows, always opt for RDS. It provides replication via a standby instance in a different availability zone and handles automatic fail-over. This is desirable for two reasons:
- Amazon guarantees 99.99999% up-time for regions, not availability zones. So having a standby replica in a different zone will ensure that even if the primary zone where your master database is hosted experiences an outage, your application will be able to fall back to the standby replica in a different (and presumably unaffected) zone.
- Your application does not need to be able to detect database failure and handle fail-over. This is handled by Amazon automatically. They will point your database host URL to the standby database if/when your primary database fails. After the crisis has been resolved, they will automatically catch up the recovered master database and promote it.
Now that we have that out of the way, let's get started. In case you’d prefer to consume this guide in video format:
Set Up the Infrastructure
Jump to the AWS EC2 console and provision a Linux server to host our DB. I’ll be choosing an Ubuntu 20.04 server because that is what I’m most accustomed to.
The rest of the configuration isn’t that important, but at the tags section, we will create a tag for identifying this DB instance. This will be used when we set up backups later.
Next we will configure the security group to allow SSH and database access. I will open up the default Postgres port 5432 in anticipation of running the Postgres service there. For added security, I am limiting access to my public IP address. For production use, you should only allow database connections from the security group of your application server.
Once the instance has been launched, take note of the public IP address and public DNS name. We will be using the public IP address to SSH into the server for setting up Postgres. After that, we can use the DNS name to connect to the Postgres server.
This will be a good time to pause and add the newly created server to your SSH configuration so you can connect in a more ergonomic way.
We will be using the following steps as a guide for configuring our Postgres server.
Let's break down the commands outlined above. The first two lines will refresh the packages in Ubuntu and install the latest version of Postgres.
sudo apt-get update -y && sudo apt-get upgrade -y
sudo apt install postgresql -y
Next we will log in as the default Postgres user (imaginatively named
postgres) and create the user roles we will be using. I will simply name my user
ubuntu and allow it to login and create databases.
sudo su postgres
psql -U postgres -c "CREATE ROLE ubuntu;"
psql -U postgres -c "ALTER ROLE ubuntu WITH LOGIN;"
psql -U postgres -c "ALTER USER ubuntu CREATEDB;"
psql -U postgres -c "ALTER USER ubuntu WITH PASSWORD 'ubuntu';"
Log out from the
postgres user account and go back to your default user by entering the following:
postgresql.conf file (it’s usually in
/etc/postgresql/12/main/postgresql.conf). In case you are not sure, you can use a handy Bash command to find it by entering:
sudo find / -name "postgresql.conf"
Open the file with your favourite text editor (yes, mine just so happens to be Nano #sorryNotSorry). We will find the configuration entry
listen_addresses and change it from the default setting to
‘*’. This will allow the Postgres server to listen on the DNS name of the EC2 instance.
After that, find the
pg_hba.conf and open it.
sudo nano /etc/postgresql/12/main/pg_hba.conf
We will allow authentication from remote connections by adding the following lines to the end of the file. This will allow connections from any IP address. Please note that in production, it would be a good idea to restrict connections to only the ones you need (a well configured EC2 security group should be sufficient, but for peace of mind you should add the IP address of your application server instead of 0.0.0.0/0)
host all all 0.0.0.0/0 md5host all all ::/0 md5
For the new configuration to take effect, we will restart the Postgres daemon by running:
sudo systemctl restart postgresql
Let’s Take It for a Test Drive
For trying out our shiny new Postgres database, we will use PgAdmin. Download and install it from here. After installation, fire it up and add the server.
Enter the DNS name from the EC2 console and the credentials we set up from before.
If the server is added, great! If you face a timeout error, make sure to double-check your security groups in EC2 and the Postgres configuration we covered before.
I’m going to create a test database and a users table, and insert some dummy data for testing.
Configure Automatic Backups
Now that we have our database up and running, let's ensure that the DB instance is backed up on a regular basis. The goal here is to back up the entire instance (along with the data and configuration) so that in case of a failure, we will simply spin up a new EC2 instance from the backed-up image (AMI).
To do this, go to the EC2 console, select Data Lifecycle Manager, and click on Create Lifecycle Policy.
Remember that tag we associated with this EC2 server? We will tell DLM to back up targets tagged with the tag
Next we will define the backup policy and schedule. I usually opt for a daily backup, and I expire backed up snapshots after a five-day retention period.
It is worth noting that this backup strategy will cause a bit of downtime due to how Amazon handles image creation. Your database server will go down for a few seconds during this backup window.
If this is downtime is unacceptable, you can use
pg_dump along with a Bash script and the AWS SDK to glue together a routine that periodically dumps the database and uploads it to S3. This script can be invoked in Unix-like systems using cron. While this is beyond the scope of this article, this gist should get you started:
That’s it for now. Enjoy being a DB admin, and best of luck!