Being Lazy with ActiveRecord

John Andrews
Kiavi Tech
Published in
3 min readSep 6, 2016

In programming language terms, “laziness” refers to code whose evaluation is delayed until the last possible moment. This strategy can have several benefits. For example:

nums = (1..1_000_000).lazy
squares = nums.map { |x| x * x }
squares.take_while { |x| x < 100 }.to_a
#=> [1, 4, 9, 16, 25, 36, 49, 64, 81]

In this (contrived) example, since we're using a lazy enumerator, we never actually calculate any squares after the ninth. We're using take_while to limit our iteration to just the numbers which we care about, saving our cpu from making about a million extra calculations.

That's super cool but I never had a good use case in ruby, outside of the realm of example problems, until the other day when I was tasked with troubleshooting why one of our reports was failing to generate. The report looked something like this:

def payout_report(search_terms)
payouts = Servicing::RemitQuery
.where(search_terms)
.flat_map(&:payouts)
.compact
generate_csv(payouts)
end

My hypothesis was that the data set was so large that the process was being killed for using too much memory. Looking at the search terms it was clear that the report was pulling a lot of data. Now Servicing::RemitQuery has a bunch of complicated logic that I didn't really want to dig into, but what I noticed is that we were pulling the whole query result set into memory at the same time with that flat_map.

Normally when dealing with a large ActiveRecord result set, I reach for the find_each method. This method executes your query and yields each result to the block just like normal each, except behind the scenes it's fetching your records in batches to keep the memory overhead low. As soon as you're done using the record in the each block, ruby can run GC and reclaim the memory. That is, as long as you're not keeping a reference to it outside of the block.

But that is exactly the problem with our method. It is iterating over the query results and flat_mapping them into one of the associations. Normal find_each wouldn't help at all in this situation.

Now one might argue that I should just query for payout objects instead of whatever is associated to them. One would not be wrong, however remember that this is a complicated query we're dealing with which is used in other contexts. Not a small task to refactor. Compromise is required to get our report back up and running.

Luckily I remembered I have another trick up my sleeve. Laziness!

# Old query, for comparison:
# payouts = Servicing::RemitQuery
# .where(search_terms)
# .flat_map(&:payouts)
# .compact
def payout_report(search_terms)
payouts = Servicing::RemitQuery
.where(search_terms)
.find_each(batch_size:100)
.lazy
.flat_map(&:payouts)
.reject(&:nil?)
generate_csv(payouts)
end

If you call find_each without a block it returns an enumerator, which can be converted to an Enumerator::Lazy with a simple call to the lazy method. Next, compact is eager, meaning that it would force the entire sequence to come into existence so it could remove the nils. Instead we switch to reject(&:nil?) which has an identical effect and is also lazy.

Because of laziness, no data is actually loaded until it is accessed inside the generate_csv method, which uses a normal each block to iterate over payouts. No changes to generate_csv were necessary.

Execution time improved vastly because now we’re not using swap space. Memory utilization went from “way too much” to “barely noticeable”. Such a huge win with so little code change!

Our engineering team is also growing! We’re hiring engineers in our San Francisco and Columbus, OH offices. See our careers page to learn more. We look forward to hearing from you!

--

--