Distributing Database Reads Across Replicas with Rails 6 and ActiveRecord

Jose Rosello
Grailed Engineering
7 min readMar 31, 2020

Now that Rails 6 has the ability to manage multiple databases and gems like Octopus are no longer necessary — or even compatible — here at Grailed we decided to take full advantage of our 2 database instances. I’ll try to be as detailed as possible, but I do recommend you familiarize yourself with this ActiveRecord capability by taking a look at the Rails documentation first.

Background

We currently run a large primary PostgreSQL instance with an attached read-only replica, and prior to Rails 6, somewhere around 99% of our queries were sent to our primary, with the remaining handful sent manually to the replica by passing them to an Octopus block:

Octopus.using(Octopus.followers) do
MyModel.where(...) # This would be sent to the replica
end

Given the boilerplate and mental overhead of invoking this block, we mostly reserved it for known long-running queries, which meant that our replica was underutilized.

With ActiveRecord 6, we wanted to find out if it was possible to seamlessly distribute our read queries across our two database instances, such that roughly ~50% of queries were issued to our replica. With a few caveats, this turned out to be possible!

Configuring Multiple Databases on ActiveRecord 6

The Rails 6 documentation has a good example on how to setup config/database.yml with a static configuration, but we can take advantage of ERB to dynamically configure this depending on our environment. At Grailed, we have various staging environments on Heroku along with production, and not all of them have multiple databases configured, so our database.yml looks like this:

default: &default
adapter: postgresql
encoding: unicode
pool: <%= ENV.fetch("PG_POOL_SIZE") %>
heroku: &heroku
primary:
<<: *default
url: <%= ENV["DATABASE_URL"] %>
<% DbHelper.follower_urls.each_with_index do |url, i| %>
<%= "follower_#{i}:" %>
<<: *default
url: <%= url %>
replica: true
<% end %>
development:
primary:
<<: *default
database: grailed_development
staging:
<<: *heroku
production:
<<: *heroku

A few things to note about this:

  • We keep a common default YAML alias that we share across all of our databases
  • We use the heroku alias for staging and production, but not development
  • We’ll go into the details of DbHelper.follower_urls later, but it retrieves available follower URL’s in an array, which we then iterate over and declare N followers and their URL’s by setting replicate: true as well. In practice, Grailed ever only keeps one permanent follower unless we’re upgrading PostgreSQL versions or other large infrastructure changes

The ERB processed file would look like this:

# Showing only production for brevityproduction:
primary:
adapter: postgresql
encoding: unicode
pool: 1 # We use pgbouncer and Unicorn
url: "postgres://our_url"
follower_0:
adapter: postgresql
encoding: unicode
pool: 1
url: "postgres://our_follower_url"
replica: true

Configuring ApplicationRecord

ActiveRecord 6 allows different models to read from different databases, but since all of our data is replicated, we want to let all models be queried from either database instance. Here’s the setup for that:

class ApplicationRecord < ActiveRecord::Base
self.abstract_class = true
connects_to(
database:
{
writing: :primary,
reading: DbHelper.reading_role,
}
)
end

writing and reading are the names Rails expects for our primary and read-only follower/replica. Since we don’t always have a follower available at Grailed, the helper DbHelper.reading_role is defined as such:

# @returns [Symbol] First follower available, else :primary. Rails 6 does not allow us to load-balance across various followers

def self.reading_role
configurations.find(&:replica?)&.spec_name&.to_sym || :primary
end
def self.configurations
ActiveRecord::Base.configurations.configs_for(env_name: Rails.env, include_replicas: true)
end

This lets us inspect our database.yml configuration to retrieve the first replica available (for us this would be :follower_0), and default to :primary if no replicas are configured — all this means is that we would always write to and read from our primary if there are no replicas. Note that there’s currently no way to specify more than one reading role per model, so we can’t distribute queries across various replicas.

Invoking the Reader Role Manually

At this point, we should have our database connections working, and we can test queries to our replica by issuing:

ActiveRecord::Base.connected_to(role: :reading) do
yield
end

This tells ActiveRecord to issue any queries in the block to the instances mapped to the reading role. We noticed an issue with this while running tests and using RSpec, however. Internally, ActiveRecord will keep two separate connection pools for the reading and writing roles, even if you use the same database instance. Since the test suite runs within a single database transaction and this block would cause us to issue queries through a different connection (one from the reading pool), it would retrieve incorrect values since the transaction was still uncommitted. To fix this, we introduced the following helper function to our DbHelper to avoid using an additional connection when there are no followers available:

def self.with_follower
if reading_role == :primary
yield
else
ActiveRecord::Base.connected_to(role: :reading) do
yield
end
end
end

We can use this the same way we use the ActiveRecord connected_to call:

DbHelper.with_follower do
yield
end

Issuing Reads to the Replica Automatically

The Rails documentation points us toward a setup that automatically sends queries occurring within the context of a controller’s GET request to the reader role (copied below)

# Contains core logic to determine which database (primary or replica) should receive the query
config.active_record.database_resolver = ActiveRecord::Middleware::DatabaseSelector::Resolver
# The resolver above accepts a `delay` option to indicate
# how long it should wait before reading from the follower
# after a write has taken place from within a session
config.active_record.database_selector = { delay: 2.seconds }
# Wraps a Rails session object, and stores the last time we issued a # write query to the database. Used by the Resolver.
config.active_record.database_resolver_context = ActiveRecord::Middleware::DatabaseSelector::Resolver::Session

The database_resolver and database_resolver_context work in conjunction to manage what queries will be issued to which database, and I highly encourage you to take a look at the code here as the interface is quite simple. The first step in the flow begins with the DatabaseSelector middleware from ActiveRecord, whose critical piece of code looks like this:

# Middleware that determines which database connection to use in a multiple database application.
def call(env)
request = ActionDispatch::Request.new(env)
select_database(request) do
@app.call(env)
end
end
def select_database(request, &blk)
context = context_klass.call(request)
resolver = resolver_klass.call(context, options)
if reading_request?(request)
resolver.read(&blk)
else
resolver.write(&blk)
end
end

Behind the scenes,resolver is initialized with an instance ofResolver:Sesssion (which is what context_klass points to above, as per our configuration), and it implements read(&blk) and write(&blk) methods (where &blk would be our queries) which “resolves” what database role we’ll use to issue our queries. The session object is initialized with a Rails request object, and it uses the Rails session to allow the resolver to store some metadata; most importantly, if a write is performed, the resolver will store the timestamp in the session, and it will avoid issuing any read queries to the reading role (replica) for the amount of time specified by the delay property above as long as the request is coming from the same session. This prevents a user (session) from reading stale values from a replica that isn’t caught up yet.

Distributing Reads

By default, these implementations will route any possible queries that occur within a GET request to the replica (unless there’s been a recent write as determined by our delay property), which is not what we want, as it basically just inverts the situation — instead of most queries being sent to the primary, they’re now all sent to the replica instead, leaving the primary instance underutilized. Thankfully we can create our own resolver, as long as we match its interface. Here’s our resolver:

class CustomResolver < ActiveRecord::Middleware::DatabaseSelector::Resolver
# The parent method returns true only if the session
# must read from the primary, but we would also like to
# read from the primary about half the time in order to distribute
# the load among our two instances
def read_from_primary?
super || rand(2) == 0
end
end

We extend the default resolver to override the read_from_primary? method, which normally only reads from the primary if it has to, and modify it to just randomly read from the primary about half the time.

We can then change our configuration to use this as the resolver instead:

config.active_record.database_resolver = CustomResolverconfig.active_record.database_selector = { delay: 2.seconds }config.active_record.database_resolver_context = ActiveRecord::Middleware::DatabaseSelector::Resolver::Session
A line graph with closely overlapping lines, indicating reads to the primary and replica are nearly equal in numbers
Reads issued to Primary (in Blue) and Replica (in Green) per minute after balancing

Caveats

Although we now have a system which sends roughly half of our read queries to the replica, there’s a few things to keep in mind.

It only works for controllers

Since this relies on the DatabaseSelector middleware, this logic is only executed within the context of a request. To issue queries to the replica from Rake tasks or jobs, we still need to manually invoke the ActiveRecord::connected_to method (or better yet, use the with_follower helper we wrote earlier).

It breaks any write we issue from the context of a GET

Perhaps the most important caveat is what happens when we try to issue a write request from the context of a GET. Although these requests are meant to be read-only in principle, we’re often forced to violate this for certain routes, especially those that involve sending magic links — password reset tokens, confirmation tokens, etc. The current setup would not allow for that. In order to circumvent this, we need to override the effect of the middleware’s read call to the resolver. Thankfully, we can just invoke ActiveRecord::connected_to, since ActiveRecord will just use the latest context. Doing this manually for each individual query called within a controller would be painful though, so we can write a hook to simply wrap any of our offending actions:

module ForceDbWriterRole
extend ActiveSupport::Concern
# This is intended to be used as an around_action hook for GET
# endpoints that need to perform writes to ActiveRecord
def force_writer_db_role
ActiveRecord::Base.connected_to(role: :writing) do
yield
end
end
end

Which we can invoke from our controller:

class MyController < ApplicationController
include ForceDbWriterRole
around_action :force_writer_db_role, only: :get_with_side_effects
def get_with_side_effects
# our DB writes
end
end

You may be reading stale values

Finally, although the 2 second delay is fairly conservative for our setup, if your Database replica takes a little longer to catch up and not reading stale values is very important to your use case, you should increase your delay.

Conclusion

With this setup, we at Grailed have been able to effectively balance the load on our databases, which has in turn allowed us to comfortably scale our servers to a higher number during moments of peak traffic (like marketing events). Hopefully you’ll be able to derive similar benefits from this. If you’re interested in this kind of work, be sure to keep an eye out and check our jobs page!

--

--