Generating XML in Ruby: Builder vs Nokogiri vs Libxml vs Ox

katharine.tamashiro
Neuronio
Published in
4 min readSep 13, 2018

This is the translation of this article originally posted by Rafael Barbolo.

During the last days, I've been optimizing a Ruby on Rails application of an API that responds to requests in JSON or XML formats.

I've identified memory bloat occurrences in some rare cases of XML generation and that’s why I’m investigating efficient ways to generate JSON and XML. Currently, this API use the RABL gem to render the response documents.

I've already told about my first mission: finding the most efficient way to render JSON in this application. Now, the goal is finding the most efficient way to render XML.

I’m willing to sacrifice simplicity, aesthetics, standardization and code maintenance. The API that’s being optimized gets a tens of millions of requests per month, so performance is key here.

TL;DR

The best solutions I've found are Ox or Libxml libraries. Both are very fast and are not memory intensive. Ox is up to 25% faster than Libxml to generate XML, but allocates about twice as much memory.

Performance to render XML

Rendering XML typically involves interpreting conversion rules (template) of an input Ruby object to an output XML.

The assessed rendering options were:

In the analysis, the Ruby object to be rendered represents an Article (post) with multiple Comments. 3 types of articles were evaluated:

  1. post 1 → article with approximately a thousand characters and 10 simple comments.
  2. post 2 → article with approximately 1 million characters and 200 comments, each with more than a thousand characters.
  3. post 3 → article with approximately 10 million characters and 5 thousand comments, each with more than 10 thousand characters.

Results

To evaluate the results, a script was used in order to measure how long it takes to render each article to XML and also how much memory is allocated during the rendering process.

         ╔═════════════╦═════════╦═════════╗
║ Render ║ TEMPO ║ MEMÓRIA ║
╔════════╬═════════════╬═════════╬═════════╣
║ post1 ║ ║ ║ ║
║ ║ RABL ║ 3,29 ms ║ 345 K ║
║ ║ toXML ║ 2,58 ms ║ 270 K ║
║ ║ Builder ║ 0,93 ms ║ 99 K ║
║ ║ Nokogiri ║ 0,78 ms ║ 75 K ║
║ ║ StringC ║ 0,23 ms ║ 36 K ║
║ ║ StringI ║ 0,19 ms ║ 26 K ║
║ ║ Libxml ║ 0,19 ms ║ 12 K
║ ║ Ox 0,16 ms ║ 24 K ║
╠════════╬═════════════╬═════════╬═════════╣
║ post2 ║ ║ ║ ║
║ ║ RABL ║ 98,0 ms ║ 9,18 M ║
║ ║ toXML ║ 88,2 ms ║ 8,13 M ║
║ ║ Builder ║ 58,9 ms ║ 5,01 M ║
║ ║ Nokogiri ║ 33,1 ms ║ 2,76 M ║
║ ║ StringC ║ 16,6 ms ║ 1,10 M ║
║ ║ StringI ║ 12,7 ms ║ 0,89 M ║
║ ║ Libxml 9,6 ms 0,21 M
║ ║ Ox ║ 10,6 ms ║ 0,40 M ║
╠════════╬═════════════╬═════════╬═════════╣
║ post3 ║ ║ ║ ║
║ ║ RABL ║ 3,42 s ║ 291,3 M ║
║ ║ toXML ║ 3,32 s ║ 265,5 M ║
║ ║ Builder ║ 2,41 s ║ 187,9 M ║
║ ║ Nokogiri ║ 1,30 s ║ 144,7 M ║
║ ║ StringC ║ 0,69 s ║ 117,7 M ║
║ ║ StringI ║ 0,68 s ║ 114,8 M ║
║ ║ Libxml ║ 0,40 s ║ 5,2 M
║ ║ Ox0,31 s ║ 10,0 M ║
╚════════╩═════════════╩═════════╩═════════╝

In all scenarios, Libxml and Ox were the best alternatives. They performed very closely in processing time. In terms of memory allocation, Libxml was significantly more efficient.

I decided to go with Libxml on the API I'm working on. My use cases are similar to those in posts 1 and 2, and the efficiency in memory allocation surprised me a lot.

The following highlights were revealed in this analysis:

  • Libxml uses about 50x less memory than RABL
  • Ox is about 20x faster than RABL
  • Libxml and Ox are more efficient than manipulating strings in Ruby
  • Interpolation is more efficient than concatenation of strings in Ruby

Final Code

I’m still thinking about the organization of the final code. For now, my suggestion is:

Controller

# app/controllers/posts_controller.rbrespond_to do |format|
format.json { ... }
format.xml { render xml: Renderer::Post.xml(@post) }
end

Renderer

Environment used in this analysis

The execution time of each test case is an average of the execution times of one thousand runs of each scenario.

The source code needed to reproduce the scenarios can be found in this gist.
The environment used in the tests is described below:

  • Computer: MacBook Pro 15" 2017

--

--