Upgrading From REE 1.8.7 to Ruby 1.9.3
By Jeff Yip
In a previous blog post, we discussed our path to upgrading to Rails 3.0 from Rails 2.3. At the time, a number of comments asked about our upgrade path from 1.8.7 to 1.9.3. We waited until the Rails 3.0 upgrade was complete and in production before beginning the Ruby upgrade. It is probably a good thing, since upgrading our Ruby version required significantly more work than we had anticipated.
We were really excited about the potential performance improvements that a number of other companies have reported after upgrading to Ruby 1.9.3. Harvest, ZenDesk, UserVoice, NewRelic, and Ngin all have released great blog posts reporting pretty significant performance gains after making the upgrade.
The first major milestone was getting our Rails app to start locally in Ruby 1.9.3. We had to upgrade a number of our gems (e.g. Zookeeper,libxml-ruby, hpricot) so that they would work in Ruby 1.9.3. For some gems, we only needed them in one environment. Gemfiles have a useful feature where you can specify the platform that you want a particular gem installed, like so:
Currently, Airbnb utilizes Ruby on Rails’ Cookie Based Session Store. By default, the cookie based session store serializes data from the session using Ruby’s Marshal. While this provides you with the ability to store complex objects in the session, it limits the portability of that data. For example, a Date object serialized by Ruby 1.8.7’s Marshal will throw an exception if you try to deserialize it using Ruby 1.9.3.
To make the session cookie portable between Ruby versions, we monkey patched the code that serializes the session to use JSON instead. Interestingly, the MessageVerifier class in Ruby on Rails 3.2.3 provides support for specifying the serializer; however, ActionDispatch:: Cookies:: SignedCookieJar does not. So we pulled in the MessageVerifier from Rails 3.2.3 into our Rails 3.0 app, and monkey patched ActionDispatch:: Cookies:: SignedCookieJar to use JSON as the serializer. To minimize session resets during the transition period while we rolled this out, before loading the session from the cookie, we try to infer whether it was serialized with Marshal or JSON by reading the first couple characters of it. The code is included in this Gist for what we call our “Ruby on Rails JSON Cookie Session Store.”
It’s worth noting that we initially wrote this code so that we could share the session between services. Like many Ruby on Rails apps that reach some amount of scale, we’re moving towards a Service Oriented Architecture. When we launched Airbnb’s “Communities” feature, which was built as its own service on Rails 3.2.3 and Ruby 1.9.3, it shared the session with Airbnb’s main monolithic Rails application (a.k.a. monorail) which was running Rails 3.0 and Ruby 1.8.7 at the time. Using JSON to serialize the session will allow us to share the session with services written in other frameworks and languages altogether, like Node.js.
Rather than dealing with sharing data between Ruby versions in memcached, we setup a completely separate memcached cluster for the Ruby 1.9.3 servers. In general, this worked out pretty well for us.
One rather obscure issue that created some major headaches for us involved the fact that data serialized using Ruby Marshal apparently takes up more space in Ruby 1.9.3 than in Ruby 1.8.7. The default maximum object size in memcached is 1MB, and some data that we were serializing in memcached no longer fit when we switched to Ruby 1.9.3. Code that once cached values suddenly failed silently when we switched to Ruby 1.9.3.
Ruby Syntax Upgrade Guide
We gradually updated our codebase so that it would work in both Ruby 1.8.7 and Ruby 1.9.3. The following is a guide on how to write code that works in both environments:
As many people have pointed out, encodings will be the biggest pain point when upgrading to Ruby 1.9.3 from 1.8.7. You’ll have to add the “magic encoding comment” on the top of every file that uses UTF-8 encoded characters.
# encoding: utf-8
Ruby 1.8 supports the American style date format, MM/DD/YYYY, so calling Date.parse on the string “10/11/2012” will return a Date object representing October 11th, 2012. But in Ruby 1.9.3, American style dates are no longer supported, and Ruby 1.9 appears to parse them in the European format of DD/MM/YYYY:
We use Jeremy Evans’ American Date gem to keep this functionality consistent between Ruby 1.8.7 and Ruby 1.9.3.
Checking an object’s methods
Calling .methods on a object in Ruby 1.8.7 returns an array of strings, while in Ruby 1.9.3, an array of symbols is returned. Instead of doing something like this:
There is a very subtle difference in how Ruby 1.8.7 and Ruby 1.9.3 handle regular expressions with UTF-8 encoded strings. In the example below, we attempt to write a regular expression that can isolate the name part from the greeting of a message “Hello Chloë,”:
As you can see, the third approach is the only version that works consistently between Ruby 1.8.7 and Ruby 1.9.3.
The meaning of the POSIX character class [:punct:] is subtly different between Ruby 1.8.7 and Ruby 1.9.3. In the following example, we attempt to replace all of the punctuation characters with a Unicode snowman:
The String class no longer supports the #each method. In Ruby 1.8.7, this method would allow you to iterate on each line of a string. This (odd) functionality was dropped in Ruby 1.9.3.
Hash#select returns an array of arrays in Ruby 1.8.7, but a proper Hash in Ruby 1.9.3. You can write code that is compatible with both 1.8.7 and 1.9.3 by wrapping the call in Hash like so:
Colons are no longer valid after “when” in a case statement. We prefer to use “then” or a newline instead.
In Ruby 1.9.3, LOAD_PATH no longer includes
. because it was deemed a security risk. You can explicitly add it when requiring files, use absolute paths, or use require_relative.
In Ruby 1.9.3, Range#member? and Range#include? behave differently for ranges that are defined by begin and end strings. In Ruby 1.9.3, those methods only return true if an exact match is in the range, not just a prefix of the string.
In order to have consistent behavior between Ruby 1.8.7 and Ruby 1.9.3, we created a class called CSVBridge, and use that instead of CSV or FasterCSV:
Miscellaneous other changes:
- ‘retry’ is no longer supported in iterators and loops
- Symbol#to_i is no longer supported (http://pragdave.blogs.pragprog.com/pragdave/2008/05/ruby-symbols-in.html)
- In Ruby 1.8.7, you could find out the current Ruby version with VERSION and RUBY_VERSION. In Ruby 1.9.3, VERSION is now gone, but RUBY_VERSION is still supported.
- Object#type has been removed, so instead, use Object.class.name
We created a nothington class called RubyBridge to encapsulate a bunch of helper methods that we found ourselves using repeatedly to make our code compatible:
Force UTF-8 Params
The following is a method that we added to application_controller as a before_filter for all actions to ensure that params were encoded with UTF-8:
More Monkey Patches
Some additional monkey patches related to handling data serialization in ActiveRecord and Thrift are included in this gist.
In line with experience of others, the bulk of the problems that we encountered with upgrading to Ruby 1.9.3 involved encodings. Once we got all of our specs passing, we needed to test the app with production traffic to uncover the more insidious encoding problems. We configured our build server so that we could maintain builds for both Ruby 1.8.7 and Ruby 1.9.3 at the same time. Rather than making the switch all at once, we deployed the Ruby 1.9.3 build to a handful of instances in our cluster so that they could get production traffic, added them into the load balance and then watched for exceptions. We’d take the instances out of the load balancer, fix the errors and repeat.
With over 100,000 lines of code in our main Rails app and support for 21 different end-user languages, upgrading Airbnb to Ruby 1.9.3 was a significant undertaking.
- our test suite runs 2–3 times faster
- we can use the latest gems
- general performance improvements, we now need fewer servers
- roughly 6 months of development time on and off
- lots of very subtle bugs and syntax changes
Was it worth it? We were hoping to see the type of performance gains that Zendesk and Harvest reported after they upgraded. While Zendesk reported 2–3x improvement in response time, we saw only a 20% improvement.
However, in the past couple months, we have been able to tune our application in numerous ways (which we hope to document in a future blog post). As a result, our performance has improved by a margin more in line with what we had hoped for:
Originally published at nerds.airbnb.com on December 11, 2012.