Tackling one of the two hardest things in Computer Science

Owen Tran
Points San Francisco

--

During today’s production push, DarthCharles and I experienced why cache invalidation is such a wicked beast. We have a typical Rails 4 stack with some multi-tenancy magic sprinkled in with the apartment gem.

Darth has been working on a new translation tool to manage over-riding tenant translations. The overall idea is the following:

  • There’s a default set of translation YAML files
  • Tenants can override the default translations using the translation tool
  • When someone clicks publish, it will generate a new set of YAML files by merging the default YAML with overrides from the database
  • Reload the I18n backend by reading the new YAML files

So, this works wonderfully if you have only one instance of Rails. The fun begins when you have a bunch of pumas web servers that need to be notified translations have been updated. Thus enters the dreaded cache invalidation problem. Let’s talk about our options:

  1. The server that gets the update could notify all the other servers but that requires knowing about all the other servers. (FAIL)
  2. We could use an asynchronous job that checks if the server translations are out of date, but that requires DelayedJob/Sidekiq/Redis. (NYAH)
  3. Check if the translations are out of date with every request. If we use a shared data store like memcached, this could be lightweight and written in a few lines of code. (ALRIGHT!)

We need to store the last time the translations were loaded and then the timestamp of the last time translations were loaded on each machine every time we want to reload the I18n YAML files. We can write the following…

def i18n_backend_reload!
time_now = Time.now
Rails.logger.warn("Reloading on #{I18n.host_name} at #{time_now}"
I18n.backend.reload!
Rails.cache.write(I18N_UPDATED_AT, time_now)
Rails.cache.write(“I18N_UPDATED_AT.#{I18n.host_name}”, time_now)
end

And now let’s create a method to see if the timestamp on our server is stale…

def reload_i18n_if_necessary!  updated_at = Rails.cache.fetch(I18N_UPDATED_AT)
local_updated_at = Rails.cache.fetch(“I18N_UPDATED_AT.#{I18n.host_name}”)
if i18n_stale?(updated_at, local_updated_at)
i18n_backend_reload!
end
end
def i18n_stale?(updated_at, local_updated_at)
return (updated_at.nil? || local_updated_at.nil?) || (local_updated_at < updated_at)
end

We write the tests and to make sure i18n_stale? works with all sorts of dates, and verify the cache is written to and read in the other methods. Works locally on our dev machine and test environments, let’s push this bad boy to production and tail some logs…

app01
Reloading I18n.backend at on app01 at 2016-01-07 19:49:04 +0000
Reloading I18n.backend at on app01 at 2016-01-07 19:49:08 +0000
Reloading I18n.backend at on app01 at 2016-01-07 19:49:11 +0000
app02
Reloading I18n.backend at on app02 at 2016-01-07 19:49:05 +0000
Reloading I18n.backend at on app02 at 2016-01-07 19:49:07 +0000
Reloading I18n.backend at on app02 at 2016-01-07 19:49:10 +0000
Reloading I18n.backend at on app02 at 2016-01-07 19:49:14 +0000

What madness is this? We immediately double-checked our staging logs which has multiple servers and lo-and-behold we see the constant reloading as well.

  1. Could the value stored into Rails.cache not be correct? Nope, we verified on Rails console that it’s a Time object.
  2. Are any of the values nil or not being set correctly? Checked memcached and pulled out the keys and there are valid Time objects in them.
  3. Is DarthCharles really a Sith Lord intending to doom our application? Nope, because we know there can only be two Sith.

The answer is when call i18n_backend_reload! we update the last time the translations were reloaded, however, this is incorrect since we should maintain the timestamp of the original refresh not the subsequent reloads on additional servers.

To avoid the cascading timestamp, we just need to be sure to pass the timestamp into the i18n_backend_reload! method.

def i18n_backend_reload!(time_now = Time.now)
# Handles if time_now is nil or empty string
time_now = Time.now if time_now.blank?
def reload_i18n_if_necessary!
...
i18n_backend_reload!(updated_at)

Morale of the story is cache invalidating isn’t so bad, just avoid pair programming with a Sith Lord.

--

--