Rails Internationalization in the real world

Or also: How to “ i18n-ify” an already started project

It’s a great idea, when starting a new Rails project, to setup i18n, even if the project is not supposed to have multiple languages.

The advantages of this approach are obvious: less repetition, better maintainability… and obviously, if you (or your boss) change your mind about internationalization, the project will be ready-to-go.

This is not always possible, because of many reasons:

  • Deadlines. Deadlines are the ruin of all codes and of all projects.
  • Not all Rails projects are developed by Rails developers. In a real world team, templates are created and modified also by pure front-end developers, UI/UX designers or simply new ruby-ists. Rails I18n is not that simple to understand and use.
  • Procrastination. Oh well, I’ll simply move that text to locales later

For these or other problems, you’ll find yourself with hundreds of HTML partials untranslated.

An approach could be to edit every file manually, or pay someone to do this extremely boring work for you.

Well… no. We are developers! We love spending most of our time to make our life automatic!

There are a few tools to make the developer life simpler with i18n. The most great is the i18n-tasks gem, that provides a set of tools to find out missing locales, remove unused ones, build locale trees and google-translate other languages starting from an existing one.

But… this does not help us very much. We don’t have any locale, we need a tool that could extract text from erb templates and move them in our en.yml.

Well, if you use HAML as templating system, this gem looks great. It seems kinda unmaintained, but well, I don’t think that HAML changed that much in the last 3 years.

However, this it’s not my case. What about HTML? Isn’t really there anything for HTML? Well, I searched a lot, and finally I find out that I have to write my own implementation.

So I tried to create a simple rake task that could both identify and move all the text that needs to be translated to locale files.

The first step was to identify those strings. An *html.erb file could contain strings in many ways:

  • As inner text of a tag:
  • As value or placeholder of input fields and textareas
<input placeholder="something" value="something else">
  • As argument for a Rails view method, like `link_to`, `label`, `submit` and many other.
<%= link_to 'something', title: 'something else' %>

For the first two cases, I could use Nokogiri and a few css/XPATH selectors. The third, instead, could create a few problems.

The first approach I tried was to use the MRI built-in Erb parser in combination with a `method_missing` trick, to capture all view methods and deal with every possible case.

matched_strings = []
def method_missing(method_name, *args, &block)
case method_name
when :link_to
matched_strings << args.first if args.first.is_a? String
when :label
erb = ERB.new("<%= link_to 'my text' %>")
# Here's my matched_strings with all strings

It works quite ok, but only for finding data, not to replace it.

So I tried a more sophisticated approach.

First of all, I opened the partial files with Nokogiri. Before opening, I moved all erb code outside of it, keeping a reference of it’s position in the original file using an unique identifier and an hash map. (Here’s the code for this)

Then, using the great parser gem from whitequark I could build an AST from the ruby code inside the <% %> and inspect it to identify the strings. The problem, here, was the same as before. How could I replace that text, after identifying it?

I tried with the unparser gem, that can convert AST code generated by parser back to ruby code. Well… It’s a great gem, really, but… it obviously changes the ruby code style, and I didn’t want this! Moreover I encountered a few issues with it, so I definitely dropped it.

Finally I implemented a really raw and terrible solution, based on regular expressions. It works! I think I could do really better than this.

However I successfully used this method to extract an 70–80% of unmanaged locale data from my projects. It’s a great goal, for me.

You can find the code on github. I implemented an interactive mode, that asks confirmation before replacing data. This avoids lots of bugs and so I could use it on my project while I was improving it.

I’m open to suggestions, help, issues, insults, and more!

What about you? Did you encounter a similar issue?