Different Methods for Merging Ruby Hashes

Paperless Post
Life at Paperless Post
4 min readFeb 19, 2015

Today, a co-worker was reviewing some code of mine similar to this:

foo({a: 1}.merge(b: 2))

He suggested that using merge! would be faster, as it would save instantiating a new hash. I was skeptical but decided to put it to the test using benchmark-ips. If you are unfamiliar with benchmark-ips, it is a really awesome gem that measures how many times something can be run in a given timeframe, as opposed to how long it takes to run something. This is a particularly useful measurement when looking at things that take a variable amount of time to execute or, in this case, things that are very quick.

I set up the script to compare these methods as follows:

require 'benchmark/ips'def foo(hash = {}); endBenchmark.ips do |x|
x.report("merge") { foo({a: 1}.merge(b: 2)) }
x.report("merge!") { foo({a: 1}.merge!(b: 2)) }
x.compare!
end
This simply replaces merge with merge! and runs each repeatedly for 5 seconds (the default from benchmark-ips). I made foo do nothing just so that all the same objects would be instantiated, without adding any overhead to each run. The results were surprising!
Calculating -------------------------------------
merge 29.046k i/100ms
merge! 48.407k i/100ms
-------------------------------------------------
merge 416.087k (± 3.8%) i/s - 2.091M
merge! 819.903k (± 4.0%) i/s - 4.115M
Comparison:
merge!: 819903.3 i/s
merge: 416087.1 i/s - 1.97x slower
Using merge! is almost 2 times as fast! That’s really great. Out of curiosity, I wanted to check the number of objects that each makes as well. I know that the difference in the way merge and merge! work should mean that with merge! we have half as many objects created, but I wanted to measure it to be sure. For that, we can use ObjectSpace. If you are unfamiliar with ObjectSpace, or need a refresher, our very own Aaron Quint has covered it a few times. To count the number of hash objects we make in a given time period, I run a script like this:
original = ObjectSpace.count_objects[:T_HASH]
1000.times { foo({a: 1}.merge(b: 2)) }
new = ObjectSpace.count_objects[:T_HASH]
puts "Made #{new - original} hash objects"
original = ObjectSpace.count_objects[:T_HASH]
1000.times { foo({a: 1}.merge!(b: 2)) }
new = ObjectSpace.count_objects[:T_HASH]
puts "Made #{new - original} hash objects"
Using merge, we created 4039 hash objects. With merge!, we made only 2039, just as I expected.
It is important to note, however, that using merge! can have some side effects in certain instances. Because it modifies the original hash, you won’t have a copy of that original object. This is especially relevant when using a method argument. For example, take the following code:def bar(hash_arg)
baz(hash_arg.merge!({ a: "blah" }))
end
hash = {a: 'hi'}
hash[:a] #=> 'hi'
bar(hash)
hash[:a] #=> 'blah'
This over-writes the :a attribute in the original object. In this instance, using merge would be preferable if you want to retain the original state of hash. You could also call dup on hash_arg. This is particularly useful when doing a number of merges:
def qux!(hash_arg)
hash_arg = hash_arg.dup
10.times { |i| hash_arg.merge!({ "num_#{i}" => i }) }
end
In case you’re curious, using merge! here is still faster than the equivalent with merge (we have to reassign the hash to actually modify it):def qux!(hash_arg)
hash_arg = hash_arg.dup
10.times { |i| hash_arg.merge!({ "num_#{i}" => i }) }
end
def qux(hash_arg)
hash_arg = hash_arg.dup
10.times { |i| hash_arg = hash_arg.merge({ "num_#{i}" => i }) }
end
Benchmark.ips do |x|
x.report("merge") { qux({}) }
x.report("merge!") { qux!({}) }
x.compare!
end
Calculating -------------------------------------
merge 2.386k i/100ms
merge! 5.962k i/100ms
-------------------------------------------------
merge 24.337k (± 3.4%) i/s - 121.686k
merge! 63.059k (± 4.3%) i/s - 315.986k
Comparison:
merge!: 63058.8 i/s
merge: 24337.1 i/s - 2.59x slower
All in all, this was a pretty fun dive into some minor performance stuff. While it might not make a huge difference at a small scale, as you start to run a method more and more the time and object space saved can add up! It’s often worth it to grab a few tools and take a look.
UPDATE: Tieg posed the question below of whether Hash#[] would be faster than using dup. I took a swing at it and it appears that he is correct! Here are my findings:def quux(hash_arg)
hash_arg = hash_arg.dup
10.times { |i| hash_arg.merge!({ "num_#{i}" => i }) }
end
def corge(hash_arg)
hash_arg = Hash[hash_arg]
10.times { |i| hash_arg.merge!({ "num_#{i}" => i }) }
end
Benchmark.ips do |x|
x.report("merge! with dup") { quux({}) }
x.report("merge! with Hash[]") { corge({}) }
x.compare!
end
Calculating -------------------------------------
merge! with dup 4.759k i/100ms
merge! with Hash[] 4.863k i/100ms
-------------------------------------------------
merge! with dup 52.455k (± 3.7%) i/s - 266.504k
merge! with Hash[] 53.576k (± 3.7%) i/s - 267.465k
Comparison:
merge! with Hash[]: 53575.8 i/s
merge! with dup: 52454.7 i/s - 1.02x slower
Thanks to Chris Belsole, Mary Cutrali, Dan Condomitti, Aaron Quint, Ari Russo, and Ivan Tse for their help on this post.

--

--