Automated Backups with the Ruby Backup Gem and Amazon S3
This is a follow up on the Phoenix deployment guide I recently published:
The importance of a good backup system is difficult to overstate. In January, GitLab experienced a catastrophic incident that resulted in the loss of production data that affected an estimated 5,000 projects, 5,000 comments and 700 new user accounts. They weren’t aware of their backups failing until it was too late. Human error must be expected in any project, and it is critical that we understand and test our backup and recovery procedures to minimise data loss and application downtime.
We’ll be setting up automated scheduled backups for our PostgreSQL database and store them in Amazon S3. You can also use this guide to create backups for any other framework that is backed by a Postgres database — Rails, Django, Node.js, etc.
If you’ve followed the guide, this is what we have so far:
- Ubuntu 16.04 on DigitalOcean’s 1GB RAM plan
- A simple Phoenix application
- A PostgreSQL database
What We’ll Need
- An Amazon S3 bucket and IAM user with permissions to that bucket.
Why the Backup Gem?
It’s a great gem with a lot of options for storage, compression and encryption. Most importantly for me, it’s been tried and tested in production and has never failed.
If you’re dealing with a database that’s larger (>50GB), the backup gem may not be the most efficient solution and you should look into WAL-E, a continuous archiving tool, instead. The backup gem uses
pg_dump, which is simple but requires more resources on the server to run, and does not support point-in-time recovery. Nonetheless, it’s a quick and straightforward solution for most projects.
Installing rbenv, Ruby and The Backup Gem
First, let’s install Ruby. We’ll use rbenv so that we can easily manage our Ruby version in the future. We begin by SSH’ing into our server:
sudo apt-get update
sudo apt-get install autoconf bison build-essential libssl-dev libyaml-dev libreadline6-dev zlib1g-dev libncurses5-dev libffi-dev libgdbm3 libgdbm-dev
git clone https://github.com/rbenv/rbenv.git ~/.rbenv
echo 'export PATH="$HOME/.rbenv/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(rbenv init -)"' >> ~/.bashrc
We’ve just inserted
rbenv into our
PATH variable on the server. Now let’s check if rbenv was set up properly.
# rbenv is a function
# rbenv ()
# local command;
Next, we need ruby-build, which we can install as an rbenv plugin.
git clone https://github.com/rbenv/ruby-build.git ~/.rbenv/plugins/ruby-build
Let’s list out all the available Ruby versions we can install.
rbenv install -l
# Available versions:
We’ll install Ruby 2.3.4 (the current stable version as of this writing is 2.4.1, but the backup gem does not support Ruby 2.4 yet), and once the installation is complete, we’ll set it as the global default. Installing Ruby takes a while, so go ahead and grab a cup of coffee while you wait.
rbenv install 2.3.4
rbenv global 2.3.4
ruby -v # 2.3.4p301 ...
Since we won’t need the local documentation for the gems we’re installing, let’s set Rubygems to not include the docs.
echo “gem: — no-document” > ~/.gemrc
Now we can install the
backup gem. Note: as of this writing, there’s a bug with version 4.4.0 that affects the cycling of backups. We’ll use 4.2.0 instead.
gem install backup -v 4.2.0
Let’s generate our Backup model file.
backup generate:model — trigger deploy_phoenix_prod_backup — archives — storages=’s3' — databases=’postgresql’ — compressor=’gzip’
This generates a Backup model file with helpful instructions. Edit the file so that it looks like the following snippet:
vim ~/Backup/models/deploy_phoenix_prod_backup.rb # or nano
Amazon S3 IAM User
We need an S3 bucket and an IAM user with permissions just for that bucket. First, we create an Amazon S3 bucket and take note of the region selected. For this guide we’ll select US East (N. Virginia), which has the region code us-east-1. If you’ve chosen a different region, you can find the corresponding region code here.
Next, visit the IAM Management Console within AWS, and navigate to the Users panel. We’ll create a new user with the same name as the bucket for easier management and check Programmatic access.
We will skip the second step for now. Ignore the warning “This user has no permissions” and create the user.
We’ll download the credentials and store them for later.
Back at the Users page, we click on the user name we just created. We’re going to create a policy that only allows the user to access the bucket we’ve created.
Now, we’ll select “Custom Policy”.
Paste the following, add a descriptive policy name like “only_deploy-phoenix_bucket” and click “Validate policy”. Remember to replace deploy-phoenix with your own bucket name.
Now that we’ve got our S3 bucket and IAM user set up, let’s go back to the Backup model file and update it with the credentials we created earlier.
Triggering a Backup
Looks like we’re ready to trigger a backup. Let’s give it a try in our VPS.
backup perform --trigger deploy_phoenix_prod_backup
# Performing Backup for 'deploy_phoenix Production Backup (deploy_phoenix_prod_backup)'!
# Backup for 'deploy_phoenix Production Backup (deploy_phoenix_prod_backup)' Completed Successfully
You’ll be able to see the backup created in S3.The file will be placed at
prod/hourly/deploy_phoenix_prod_backup/<timestamp>/deploy_phoenix_prod_backup.tar. If you download and extract the tar archive, it will reveal a folder containing the gzipped dump file. Extracting it, we should have a
PostgreSQL.sql dump file that we can view in our text editor.
Now that we know how to trigger a single backup, let’s schedule it to run hourly.
Scheduling the Backups
We’ll use the whenever gem, which allows us to write elegant syntax for managing the crontab.
gem install whenever
# [add] writing `./config/schedule.rb'
# [done] wheneverized!
Edit the schedule.rb file that has just been created.
Next, we’ll update our crontab with:
# to view the updated crontab, use `crontab -l`
Check back again in an hour and you’ll see the backup created automatically by the cron entry. Nice!
Optional: Encryption and Decryption with GPG
Our backups now give us the peace of mind that we can restore our database easily in the event that our server dies for whatever reason. However, our work here is not yet done. We live in a world where some of the most advanced tech giants have been pwned, so we should prepare for the scenario where one of your database dumps ends up in malicious hands. It could be due to an act of carelessness resulting in an accidental exposure of your IAM credentials, an Amazon data breach, or even a rogue employee. Let’s put on our paranoid hats. Onward!
GPG ensures our data will be strongly encrypted in transit and at rest, rendering our database dumps useless to anyone that doesn’t have the private key.
First, we need to generate a new GPG key. Use your development machine, not the VPS.
Follow the instructions. You should use 4096 bits—other than that, the defaults are usually fine. Next, copy the public key you’ve just created.
[EMAIL] is the email you specified when generating the keys.
gpg -a --export [EMAIL]
Paste the key into your Backup model file. Remember to update the email address
firstname.lastname@example.org to the one you used for generating the keys.
Now, trigger a backup from the VPS:
backup perform --trigger deploy_phoenix_prod_backup
When we visit S3 and download our latest backup, the filename now ends with
.tar.gpg instead of
.tar. Let’s try decrypting it on your development machine.
gpg -o mybackup.tar -d deploy_phoenix_prod_backup.tar.gpg
-o flag allows us to enter the name of the output file. After keying in the passphrase, we have a decrypted tar archive that we can extract as we did before.
We now have a complete backup solution for our Phoenix application that is backed up hourly, weekly and monthly, with automatic cycling. Lastly, we have encrypted our backups so they are safe from prying eyes even if our dumps somehow fall into the wrong hands.
Remember to try out a database restore (
psql -U <username> -d <dbname> -1 -f <filename>.sql).
Credit goes to Ben Dixon (Reliably Deploying Rails Applications) — this solution borrows many ideas from his book.