Transforming Hashes — Which Way is Best?

Enumerable#map is a great way to transform a collection according to set rules. According to the docs, this method:

Returns a new array with the results of running block once for every element in enum.

But a lot of times, we’ll want to transform a hash and return a new hash, not an array. #map feels natural for this sort of thing — it’s point is precisely to transform collections — but since it returns an array, by the time we end up with the hash we’re looking for, using it for hash transformations ends up feeling very unnatural.

I’m a huge #each_with_object fanboy, and here, I will argue that #each_with_object is the best choice for hash transformations. I’ll consider it alongside a couple other ways of transforming hashes and look at them both from the point of view of ease of implementation, readability, and performance.

Let’s start by looking at using #map, with some modifications to turn the resulting array into a hash that’s been transformed from the original in the way we specify. It’s advocated for here and here.

1. #map and #to_h

hash = {a: 2, b: 3}
hash.map { |k, v| [k, v + 3] }.to_h # => { a => 5, b => 6 }

#map passes each key-value pair of the original array into its block and creates a nested array. So, before #to_h is called, we have:

[[a, 5], [b, 6]]

When #to_h is called on a nested array, it turns the first value in each subarray into the key of the hash and the second into the value.

Now, let’s look at my preferred implementation:

2. #each_with_object

hash = { a: 2, b: 3 }
hash.each_with_object({}) do |(k, v), a|
a[k] = v + 3
end # => { a => 5, b => 6 }

This method iterates over each key-value pair and creates a new hash with the object passed into #each_with_object. So, the block variable a (for accumulator) gets assigned to an empty hash, which is then built up according to set rules.


Which is Easier to Implement?

Both methods require a little bit of thinking to implement.

For #map, we have to remember that #map returns an array and that #to_h works on nested arrays, where each subarray is two items long. That means #map’s block has to return a two-item array for each item the collection.

For #each_with_object, we have to get used to the way #each_with_object works. For hashes, it transforms each key-value pair into a two-value array before passing it to its block, so we have to remember how to arrange the block parameters in the right way:

… |(k, v), a| …

Which is Easier to Read?

While there’s things to remember about both methods of hash transformation, I think #each_with_object is more transparent. It just looks like you’re making a new hash: a[k] = v + 3 does exactly what you’d expect.

#map just doesn’t look like you’re building a hash. It looks like you’re building a huge nested array. Because you are! It’s just that #to_h comes in at the end and transmogrifies the nested array into a hash. That’s less readable.

“Goes in a nested array, comes out a hash. Why not just build a hash in the first place?”

If there’s no significant performance differences, then the edge has to go to #each_with_object. Let’s find out.


Which is Faster?

Let’s run some benchmarks.

First, let’s test a small hash transformed many times:

require ‘benchmark’
include Benchmark
hash = { a: 2, b: 3 }
n = 1_000_000
def transform_with_map(hash)
hash.map { |k, v| [k, v + 3] }.to_h
end
def transform_with_eachwithobject(hash)
hash.each_with_object({}) { |(k, v), a| a[k] = v + 3 }
end
Benchmark.bm do |x|
x.report(“map w/ to_h”) { n.times { transform_with_map(hash) } }
x.report(“each_with_object”) do
n.times { eachwithobject(hash) }
end
end

The results:

                 user     system   total     real
map w/ to_h 2.230000 0.000000 2.230000 ( 2.236392)
each_with_object 1.990000 0.000000 1.990000 ( 1.991212)

#each_with_object is the clear winner here, being about 10% faster.

Now, let’s see if things are any different a huge hash transformed one time. This is the more likely scenario. The benchmark code:

n = 1_000_000
hash = {}
n.times { |i| hash[i] = i }
def transform_with_map(hash)
hash.map { |k, v| [k, v + 3] }.to_h
end
def transform_with_eachwithobject(hash)
hash.each_with_object({}) { |(k, v), a| a[k] = v + 3 }
end
Benchmark.bm do |x|
x.report(“map w/ to_h”) { transform_with_map(hash) }
x.report(“each_with_object”) do
transform_with_eachwithobject(hash)
end
end

And the result:

                     user |   system |    total |        real
map w/ to_h 1.200000 | 0.050000 | 1.250000 | ( 1.256310)
each_with_object 1.230000 | 0.030000 | 1.260000 | ( 1.259652)

Nearly identical performance here. While our benchmarking here is hardly scientific, it at least give us some evidence to work with. We could dig into Ruby’s implementations of these methods to find out why we get these results, but I’m happy knowing that my preferred implementation is as fast if not faster than the alternative and moving on.


The Verdict

Since neither method is any easier to implement than the other, and #each_with_object sometimes (but not always) yields a performance benefit, I suggest that’s the way we always ought to do it.

I’ve been talking about hash transformation, but I think this is the way to go for transforming an array of objects into a hash as well. Here, Chris Mar advocates using #each_with_object for building up hashes from a large array of, say, Rails ActiveRecord objects. If we have 100,000 user objects and we want to build up a hash with each user’s id as the key and then perform some operations to get the value, #each_with_object will probably be just as quick and more transparent than going about it in other ways.

One implementation that I didn’t consider is using #reduce. Chris’s post does a good job explaining the virtues of #each_with_object over #reduce or #inject for transformations. Those methods function similarly to #each_with_object, except that you have to keep track of the block’s return value, which becomes the accumulator for the next iteration. That can be a big pain.

I’ll add that on a semantic level, we should expect a single object to be returned from #reduce, rather than a new collection, since its point is to reduce a collection down. That’s why I don’t like it for transformations.

The bottom line: #each_with_object is my favorite method and also probably better than your favorite method. I could be convinced otherwise.

I invite feedback. Is there another way to transform hashes that I’m missing? Is my way no good?