Using XPath to rewrite Ruby code with ease

Tools that rewrite Ruby code, such as rubocop, do so by using the excellent parser gem. The parser gem allows you to convert your Ruby code into an AST (abstract syntax tree). For a primer on this topic, see the introduction to the parser gem.

While building textractor we often found ourselves writing code to query and filter ASTs to find the exact node to modify. For example to programmatically turn <%= f.text_field :name, placeholder: "Your name" %> into <%= f.text_field :name, placeholder: t('.your_name') %> we need to find the node of the value for the placeholder key, in a hash that happens to be an argument for a text_field call.

It turns out there is already an excellent query language for searching tree structures: XPath! All we have to do is turn an AST into an XML tree, run the XPath query, and find the original AST node belonging to the matches.

TL;DR: This post shows you how to turn this:

Into this:

*/send/hash/pair[sym[symbol-val/@value="placeholder"]]/str

All right, let’s get started!

So what does the AST for our example input <%= f.text_field :name, placeholder: "Your name" %> look like?

We need to recursively convert that data structure into XML. Here’s a short class that does exactly that:

We use REXML because it comes with the Ruby standard library. So far performance has been good, but if XML/XPath processing becomes your bottleneck, it’s easy enough to replace with nokogiri.

Let’s see it in action:

However, if we want to be able to query on the values of literals, we’ll also need to add a value attribute:

Now our XML looks like this:

Time to try out some XPath. First, we add a convenience method to our XMLAST class:

Let’s try it:

Pretty neat! But we’re not quite there yet. If we want to do anything useful with the results, we’ll need the original Ruby objects representing AST nodes.

We could cheat and convert the results XML into a new AST, but that would almost certainly break the rewriter library built into the parser gem. Not to mention being horribly inefficient.

So instead we will add a bit of metadata to our XML tree, specifically the Ruby object IDs of the original nodes. Fortunately this is as easy as node.object_id:

Which results in the following XML:

Now that we have the original object IDs in our XML output, we can walk the tree to find the original nodes. The implementation below is not very efficient, but it is very short. Optimizing the performance of a recursive tree walk is left as an exercise to the reader.

First, we need a way to recursively add all nodes to an array:

Then, we can use this to find our matching object ID:

And here we are, a very quick and expressive way to juggle your ASTs:

See the complete source at the bottom of this post.

If you want to further shorten your XPaths you could add more metadata to your XML tree. For example in textractor, if we encounter a send node (a method call) we automatically add message=”method_name” to the XML element. This allows us to write XPath such as send[@message="form_for"].

We are currently developing multiple products using this library. Once the XML format stabilizes, we plan to extract the library from our product and release a gem. If you are interested in using these techniques in your project, we’d love to help! Send us an email at info@snootysoftware.com.

At Snooty Software, we develop tools that programmatically modify code. Our first product, Textractor, takes an existing Rails project and prepares your ERB views for translation by replacing string literals with t() calls.

Complete source: