TL;DR: Always use an absolute path to Unicorn in Capistrano scripts; otherwise your application server will fail to restart after a number of deploys because of Capistrano release cleanups.
Last month, we came across an interesting problem: sometimes after a successful deploy of one of our Ruby on Rails 4 applications, a Unicorn server could not be restarted using zero downtime deployment signal. That affected all environments we run the application in — unfortunately, including production. As a result, we got annoying exceptions like ActionView::MissingTemplate in the production environment. Only a hard restart of Unicorn would fix that kind of exceptions.
We had no idea what could be the source of the issue, so we decided to check this case out using our monitoring system (Zabbix) to catch such incidents immediately and act on it.
Here is a simple Rails code snippet we've used to detect if Unicorn was properly restarted:
Note that the REVISION file is written by Capistrano on every deploy, so we can simply compare the contents of the file with the current application revision we have in memory.
And here is a snippet of Zabbix configuration we've used:
Next, we’ve created a Zabbix agent item (“Revision status check”) and used it for a trigger:
Now, how does this trigger work, exactly? It checks for a non-200 HTTP response and ensures that it persists during a period, to prevent false alerts during deploys and rebouncing. Google “Hysteresis in Zabbix” for more details, if you're interested.
The first alert we've got led us to the following output in the Unicorn log file:
And at that moment I’ve started to get a clear understanding of what’s going on.
Here is how zero-downtime Unicorn deploy works: it starts the master process and replaces it with a fork on the next deploy.
On the first deploy, bundle exec unicorn_rails command was mapped to /home/project/releases/1/vendor/bundle/ruby/2.1.0/bin/unicorn_rails, which started properly. But later, let’s say on 100th deploy, the original Unicorn still pointed to the /home/project/releases/1 directory from the first release. By default, Capistrano 3 keeps only the last five releases and cleans up older releases (deploy:cleanup task runs automatically by default), and obviously /home/project/releases/1 did not exist anymore.
Since /home/project/releases/1/vendor/bundle is just a symlink to /home/project/shared/vendor/bundle, we had to configure Bundler to map commands to the shared directory — which exists forever.
Here is a problem: BUNDLE_PATH is relative, and it maps to /home/project/releases/1/vendor/bundle, which would be destroyed later after a few deploys.
And the fix:
In Capistrano 3, this patch can be applied by setting the `bundle_path` option:
In spite of my history of contributions to Capistrano and Bundler, it was one of the most interesting bugs I’ve found recently.
We've decided to keep monitoring Unicorn revisions to catch any failed restarts in future.