Image credit http://compsci.ca/blog/ruby-best-introductory-programming-language/

Ruby in Production: Lessons Learned

Having deployed a variety of Ruby apps (Rails and non-Rails) over the course of many years, here are some lessons I’ve learned to keep things afloat. Tools like Mina and Capistrano already do most of these (more on that further down), but its good to have a first-hand understanding of what needs to happen.

All these instructions are available a ready-to-use repo:
ruby-deploy-kickstart
Ruby/Rails deployment template with .env, Foreman, Ansible, Docker & Vagrant.
Wow, never expected this to get so much visibility. Thanks to @yukihiro_matz @mperham and many others for the tweets!

TL;DR -

  1. Dependencies: Use nodejs+execjs for asset compilation, instead of therubyracer+libv8. This will make deployments much faster. Treat NodeJS just as you would treat GCC — a compile time dependency. Also install XML, XSLT, ZLib and OpenSSL development libraries.
  2. Installing Ruby: Use BrightBox’s Ruby packages for Debian and Ubuntu; rbenv or RVM for others
  3. User/Group: Create separate User and Group for your app. Give it a home folder. Give a fixed UID/GID to the user and group.
  4. Source Code: Clone the repository under <app_home>/repo
  5. Bundle Install: Make sure /etc/gemrc contains “gem: — no-ri — no-rdoc” to avoid compiling documentation in production. Create <app_home>/gems. Symlink <app_home>/repo/vendor/bundle to <app_home>/gems. So now your gem installations will survive even if your repo is freshly checked out.
  6. Persistent folders: Any folders that need to survive across releases should be created under <app_home> and symlinked into <app_home>/repo. App code should only look inside the repo, and should not access external folders directly. Remove any configuration variables that dabble with folder names, and instead symlink whatever you want into standard locations within the repo.
  7. Environment: Use one environment “Production” for every instance. Environment=behavior, Configuration=values. Environment is the same, and only configuration values will vary across instances, so don’t create separate environments like QA, UAT, etc for every instance.
  8. Configuration: Use dotEnv. A “Production” environment should use different Environment variables for different servers (QA, UAT, etc).
  9. Process Management: Avoid daemons, PID files, nohup and &. Use Foreman to install a system service, and control your app as a service.
  10. Logging: Avoid puts, use Logger. Log everything to STDOUT using rails_12factor. Make Rails logs sane using LogRage. Make sure every request has a unique ID.
  11. Tools: Mina, Capistrano, Docker, Ansible, Chef. Check out the sample project repository.

1. Dependencies

  • Regular dependencies: git, development tools (build-essential in apt or base-devel in yum)
  • NodeJS. Yes, NodeJS. If you’re having an app that compiles JS and CSS assets, you either need nodejs+execjs, or therubyracer+libv8. libv8 will download and compile the entire V8 (chromium/nodejs’ javascript) engine while doing a bundle install, and it will take a lot of time and resources. Whereas installing nodejs through your system will instead download precompiled binaries. Recent versions of Rails actually comment out “therubyracer” in the Gemfile for this reason. They want you to make a choice instead of defaulting to a slower option.
    My experience: Simply treat NodeJS as you would treat GCC. Its just a build dependency, not runtime. But compared to therubyracer/libv8, it will save you tons of bundle install time, make deployments much faster and reliable, and will reduce your application’s installed footprint. Just use whatever nodejs or iojs version that apt-get or yum provides. execjs will pick it up automatically.
  • XML, XSLT, ZLib and OpenSSL development headers — These will resolve many gem installation errors, including nokogiri, puma, etc. Recent versions of nokogiri are intelligent enough to compile XML/XSLT whenever system versions conflict, so you have nothing to lose by having these pre-installed.

2. Installing Ruby

If you’re on Debian systems, prefer using BrightBox’s Ruby Packages. They’re precompiled, stable, up-to-date, have the latest bug fixes and can run multiple supported ruby versions.

Why not RVM or rbenv? Because they introduce subtle quirks — shell initialization (both RVM & rbenv), login shell, overriding core functions and autolibs installation (RVM), rehashing shims (rbenv), etc. These are fine for development. But for production, system-managed packages are better. RVM/rbenv should only be a fallback where proper system packages are not available. That said, both RVM and rbenv offer extensive guides if you’re going to use them in production, and both are amazing pieces of software with a large installation base. So you won’t go wrong with these either.

3. Create Separate User & Group

  • Create a separate user and group for your app. I use the name of my app (let’s call it myapp for now), but feel free to use whatever you want. Lockdown the user. No password, no other groups, /bin/false as shell.
  • Create a home folder (/home/myapp). This is where your code and everything else is going to reside. Home folders (despite the name) don’t necessarily have to be under /home — you can also have it under /var, /srv, /opt or /usr/local.
  • Why? Using a generic user like www or nobody, with app under /srv might sound more unix-y. But in my experience, applications are not just blobs of files — they live and breathe. They have SSH configs and keys, environment files, cron jobs, background workers and a lot more. Simply treating every app as www or every system service as nobody is not the right way, and that also introduces other problems like having to reset folder permissions all the time when your sudo users don’t run as nobody/www, etc. So plan for the long run, create one user/group per app and give it a home folder.
  • Bonus: Provide a fixed UID/GID to your app user and group. In Unix, what you see as username and group name — are actually stored as UID and GID, which differ from machine to machine. By freezing these, if you’re mounting shared data folders across multiple machines (say EBS volumes or docker volumes), your app will never have any permission issues. Just pick a 4-digit number, like gid=6156 and uid=6157.

4. Checkout the Repository

The repository should not be checked out at the base of /home/myapp. It should go as a subfolder. I call it /home/myapp/repo — but just pick any folder name. SSH keys required for arbitrary purposes (e.g. cloning the repo) can go under /home/myapp/.ssh as usual.

Some tools (Mina, Capistrano) create different folders for different releases and swap links between them. I find it easier to switch using “git checkout”. Anything is fine.

5. Bundle Install

Ruby gems will generate Documentations (both RI and RDoc) by default. That takes a long time, and you should be disabling that both in development and production. Create the following file to avoid it:

# /etc/gemrc
gem: --no-ri --no-rdoc

Next up, create a folder: /home/myapp/gems and symlink /home/myapp/repo/vendor/bundle to /home/myapp/gems
The repo folder can be cleaned up for every release. But using this symlink, gems installed once will persist, and they won’t be downloaded again and again for every release.

And make sure to run bundler with the following flags at minimum:

bundle install --deployment --clean --without development test

6. Persistent Folders

/home/myapp/repo should be your code’s entire world.

Every folder/file that your app needs to run must be created, copied or symlinked inside this. Your application code shouldn’t be accessing anything outside this folder (directly). So if you’re specifying different folders as configurations, get rid of those.

E.g. for Paperclip, create a folder /home/myapp/uploads, and symlink it to /home/myapp/repo/public/uploads. This way, your app always reads/writes to public/uploads when running, but the real files are safe somewhere the app doesn’t need to know about.

Some people will think about symlinking Log and PID folders right now, but wait. More coming up on this.

7. Configuration vs. Environment

Rails has Environments. Many people misuse it and its poorly defined. I can’t stress this enough, but:

Don’t create an Environment for every Instance!

For example, you may have a QA, a UAT, and a Production instance. Do not create 3 environments!
Environment = Runtime Behavior/Mode
e.g. cache_classes, serve_static_files, threadsafe, etc.
e.g. You know that in production, cache will be enabled.
Configuration = A bunch of values
e.g. same behavior, but different URL or endpoint, like database, cache, etc.
e.g. Cache server may change in QA/UAT/Prod, but behavior is always on.

You’re running the app in Production mode, with serverX's Configuration, where serverX is QA/UAT/Production/etc. Its important to understand this distinction. You can keep switching Configuration files, while still running the app in Production mode across servers. Start looking at Production mode as “Live Mode” and things will make sense. Your QA, UAT and Production should be running as RACK_ENV=production but just with different configuration files. The ideal app will have only 3 environments: development, test and production.

For example, you may use SendGrid/Mandrill/etc to send emails, but you don’t want real mails to go in QA and UAT. Simply create test accounts over there, and set those details in your QA/UAT configuration files. Now you can still run QA/UAT with RACK_ENV=production, but your configuration will ensure that email delivery is using a dummy account. What if your entire Mailing logic differs from QA/UAT to Production, say you’re using a different transport mechanism altogether? Remember: the whole point of QA/UAT is to test end-to-end production behavior. If you’re having a payment gateway, you should be hitting a dummy/stub server. Otherwise its not doing proper testing.

Having Rails-style multiple environment files for each environment also poses other difficulties: Whenever you change a certain behavior (say cache_classes, serve_static_files, threadsafe, etc) you have to remember to change every environment file. When upgrading Rails, many of these behaviors change. Syncing them across all environment files is not trivial, and must be enforced with diligence.

8. Configuration

Now that we’re clear on Environment (Mode/Behavior) vs Configuration (Values), let’s focus on the latter. You will have different Configuration values (database connection, external URLs, email credentials, etc) for different servers (QA, UAT, etc), which are all running in the same Live (production) mode. These values should be fed into the application as configuration variables.

  • BAD: One single YAML file with keys/values for all possible servers.
# BAD
qa:
database_url: ...
google_client_id: ...
uat:
database_url: ...
google_client_id: ...
production:
database_url: ...
google_client_id: ...
etc
  • BAD: Creating different environment files (config/qa.rb, uat.rb, etc) and putting configuration values as code in them. As we already discussed, different environments is bad enough, mixing configuration variables with behavior is even worse.
  • GOOD: Make YAML files load configuration from environment variables, and use different “.env” files for each server. When deploying, you would copy the server-specific .env file into /home/myapp/repo/.env.
# GOOD
production:
database_url: <%= ENV['MYAPP_DATABASE_URL'] %>
google_id: <%= ENV['MYAPP_GOOGLE_ID'] %>
google_secret: <%= ENV['MYAPP_GOOGLE_SECRET'] %>
# NOTE: This can be single or multiple YAML files
# (database.yml, secrets.yml, etc)
# qa.env
MYAPP_DATABASE_URL: postgres://...
...
# us-east-server.env:
MYAPP_DATABASE_URL: postgres://...
...
# During deployment, copy us-east-server.env into .env

My experience: Use dotenv. It’s becoming a standard and is available across languages (nodejs, python, go, etc). Since its just a collection of environment variables, its very flexible. You can load it in Bash shell and profiles, provide it to Docker (which supports loading environment variables from a .env file), etc.

I also use git-crypt (or chef data bags, ansible vault, etc) to save all possible server configurations as encrypted files directly inside the repo. This way, everything I need is versioned and secured inside one single repository, without having to hunt anywhere else.

8.1. Mandatory Configuration

You must have the following variables in every configuration file:

  • RACK_ENV: Prefer RACK_ENV over RAILS_ENV
  • PORT: Port on which your rack server should listen to (for web apps)
  • TZ: Timezone of that server. Some servers provide this at the system level, so you may not have to specify this, but make its available. Your Rails app must have config.time_zone set to this value, all date/time fields in your database should be date/time with zone, and must be saved in UTC. Otherwise expect a lot of pain when international users start using your application, or when you switch servers.
  • All other configuration for your app should be prefixed with “MYAPP_”. This makes it easy to identify when inspecting and comparing environment variables across servers, and the namespacing ensures that your environment variables won’t clash with system variables.

9. Running App and Services Reliably

Okay, so you have setup the system, created users/groups, checked out the repo, symlinked folders and copied server-specific configuration. Now you need to Reliably run the app and any other background worker.

Do not use Daemons, PIDs and Zombies (nohup)

This is a bad practice. Do not use gems or scripts that manually background a process and save a PID file. Avoid nohup and &! These are all bad ways to run a process, since essentially they are creating unsupervised, unattached processes. If they crash, there is nobody to know. They should be run under supervision.

You must define your app as a proper System Service, just like a database. This way, it will actually start when the system boots, and in many cases, will be restarted if it crashes as well.

Its not that difficult really. Simply use Foreman to install a system service! Foreman is a must have Job runner both for development and production. It’s also cross platform (has ports in Ruby, NodeJS, etc) and directly reads “.env” files, so your environment variables are available without even using the dotenv gem to manually load it.

Simply create a Procfile:

# Procfile
web: bundle exec puma -C config/puma.rb
jobs: bundle exec rake jobs:work

And export it as a system service (check this for more examples):

# Add myapp/repo folder to PATH
export PATH=$(pwd):$PATH
# Install service
bundle exec foreman export -a myapp -u myapp <type> <location>
# e.g. Ubuntu:
bundle exec foreman export -a myapp -u myapp upstart /etc/init/

And then you can control your app using:

sudo service restart myapp           # ubuntu
sudo /etc/init.d/myapp restart # inittab
sudo systemctl restart myapp.service # systemd

Now your app is on par with a core system service. You don’t have to dabble with PID files, your system handles that. Any modern OS will automatically reboot your service if it crashes, rotate your logs, and more.

You can go one step further, and use Foreman to start God.rb. God will give you even more control over processes, can send alerts when they crash, exceed Memory/CPU usage, etc — while Foreman simply installs the System service and loads the environment.

The good thing about Foreman is that you can also use it in development. No more opening multiple windows with commands in development mode. Just use foreman to start and stop your entire app.

9.1. App Server

Some people use Nginx with passenger, and they may think — why should I even install a separate system service?

Your app should run with or without nginx. Most Ruby app servers — Passenger, Puma, Unicorn, etc are all available as Gems. Even Passenger works as a gem with the latest 5.0 (Raptor) release, as does TorqueBox.

So you simply run your app directly from the Procfile:

# Procfile
web: bundle exec passenger start # passenger
web: bundle exec puma -C config/puma.rb # puma
web: bundle exec unicorn -c config/unicorn.rb # unicorn

And if you really want Nginx, install it separately, and proxy requests to your app. This way, your app is future proof and self contained inside the Procfile. It can integrate with with EC2 ELBs equally as well as Nginx.

TL;DR — Do not make Nginx as a core part of your app. It is just an external proxy that you may or may not choose to use.

I don’t want to get into the which app server to use? — because Ruby app servers are exploding in dozens. Just stopping by to say that Puma is the recommended default in Heroku as of writing. It can change anytime, so please do your homework, choose the right app server, and create a proper configuration file for it with workers/threads/whatever.

10. Logging

Logs are your Insight. You should log all app requests, external service requests and everything else succintly. So make sure everything gets logged. For example:

begin
# Do something, call some service, etc
render :ok
rescue => e
render 'some failure page'
# Don't silently swallow exceptions
# Make sure to call logger.error
end

But the problem is, when there are many workers or threads running, the logs will get mixed up. So make sure every request gets a Unique Request ID, and print every log with the Request ID so that you can trace things back.

Rails should not be managing the logs — the OS or container should be. So Rails should write all logs to STDOUT, instead of a logs folder. Use the rails_12factor gem which does that.

If you are using rails_12factor with Foreman system services, then the system will manage your logs under /var/logs. The exact location will differ, e.g. Ubuntu stores it under /var/logs/upstart, but they’re over there. Any modern OS (Ubuntu or RedHat 7+) will automatically rotate the logs as well.

Production logs should be clean, One request in one line, with only exception traces spanning more than a line. If you’re using Rails, the LogRage gem is a must have for production. And its very easy to configure too. Whenever you need to log multiple values, don’t resort to crazy regexes. Just make simple key=value log statements:

# BAD:
logger.info "[GET] [some_service] with {user 1} took 200ms"
# GOOD:
logger.info "service=some_service time=200ms method=POST user=1"

Do not use puts for logging. Your code should use a logger, so that in production, unnecessary debug statements are excluded.

# BAD:
puts "my log"
# GOOD:
Rails.logger.info "my log" # rails
request.logger.info "my log" # sinatra
# Or create and use Ruby's in-built logger:
logger = Logger.new(STDOUT)
$stdout.sync = true
logger.info "my log"

Once you have proper logging, you can create many dashboards out of it. For example, you can filter certain external service requests and graph their response time. So logging is an extremely powerful tool, make sure you do it right.

11. Tools

Capistrano and Mina are awesome tools. They can do most of this, and they are also highly extensible — so any extra step here (user creation, service installation, etc), you can always write tasks that do these.

I’m more of a systems person, and I prefer to use a full fledged infrastructure automation tool like Chef, Puppet, Ansible or Salt.

12. Conclusion

These are my learnings, and following them has helped me create flexible apps that work well in a variety of use cases: They can run as Docker containers, in front of EC2 Load Balancers or Nginx, in Heroku, Vagrant and in normal run-of-the-mill servers. Here is a repository with all these put together, along with Ansible and Docker scripts:

GitHub: ruby-deploy-kickstart
Ruby/Rails deployment template with .env, Foreman, Ansible, Docker & Vagrant.

If you’ve come this far, Thanks. Please share your feedback and improvements as comments and pull requests.