There are certain bugs where you need the actual production data to be able to reproduce and therefore fix them. When you've just launched your application, you can do this simply by downloading the entire database and importing it locally. However, as your application (hopefully) becomes successful and the size of the database grows, this quickly does not become feasible anymore. In this post I will explain how we solved this problem by maintaining a backup of the production database on a cheap Amazon RDS instance and accessing it through a special debug environment in Rails.

Amazon RDS

RDS is Amazon's relational database service. They offer several types of instances as can be seen on the pricing page, differentiated by performance characteristics. As we're only going to use the database as a debugging environment, performance is not important so the smallest type (db.t2.micro) will do.

It's straightforward to set up a PostgreSQL instance simply by going to the RDS console and following the steps there. The only parameter that is important now is the allocated storage as it contributes to the price. Therefore, make sure to pick a value that represents the actual size of your database and don't over-allocate (see the FAQ on more information of how the monthly price is calculated).

Setting up the Rails environment

Now that your RDS instance is ready, we're going to try to access it through a new Rails environment debug. The goal is to be able to connect to the RDS instance simply by starting the Rails server with:

RAILS_ENV=debug rails s

Start by creating the environment in config/environments/debug.rb. The easiest is to simply have it inherit from the development environment:

# Based on development defaults 
require Rails.root.join(“config/environments/development”)

The next step is to configure config/database.yml for the new environment.

debug:
adapter: postgresql
encoding: unicode
database: <%= ENV[‘DEBUG_DB_DATABASE’] %>
username: <%= ENV[‘DEBUG_DB_USERNAME’] %>
password: <%= ENV[‘DEBUG_DB_PASSWORD’] %>
host: <%= ENV[‘DEBUG_DB_HOST’] %>
port: <%= ENV[‘DEBUG_DB_PORT’] %>
sslmode: verify-full
sslrootcert: config/ca/rds-combined-ca-bundle.pem

The two things to explain here are sslmode and sslrootcert. As we are going to transfer real production data between your computer and AWS, which may contain sensitive data, we want to make sure that we do so over SSL. As explained on the RDS documentation, you need to download the public key stored at https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem and save it in your project, in this case in config/ca/rds-combined-ca-bundle.pem.

Transferring a backup of the database to RDS

Finally, we need to populate the debug database. At HSTRY, we already store a nightly backup on AWS S3 using the pgbackups-archive gem. We therefore wrote a job that we run nightly to restore the latest backup from S3 to the database on RDS. It does three things:

  1. Determine the latest backup.
  2. Download the latest backup to a temporary file.
  3. Import the backup into the RDS database using pg_restore.
Feel free to adapt it for your needs.

Conclusion

That's it! By simply starting the Rails server with RAILS_ENV=debug, we can switch to an environment where we have access to real production data. We found the two main benefits to be:

  • bug fixing: It greatly improved our capacity for quickly solving bugs that our users were experiencing as there’s only so much you can do with seed data.
  • performance: An additional benefit is that it also allows us to identify and fix n+1 queries on real production data. We use New Relic to detect especially long requests and then reproduce and fix them locally.