ActiveRecord::Relation::Union pitfall that could cause a disaster
What happened?
I was refactoring a piece of code that was iterating over a big chunk of data. The operation looked like:
contracts = scope1 + scope2
contracts.each do |contract|
...
end
This is a somewhat typical, inefficient way of querying and processing large amount of data. With the expression scope1 + scope2
, all the records are loaded into the memory at once.
So using active_record_union gem, I changed it to
contracts = scope1.union(scope2)
contracts.find_each do |contract|
...
end
This way, scope1.union(scope2)
doesn’t actually run the query, and contracts.find_each
queries records in batches with sensible size.
Now, scope2
is a method that returns either an instance of ActiveRecord::Relation
or an empty array.
def scope2
return [] if ... Contract.some_scope
end
This worked fine in the pre-refactor code — except for the inefficiency — because scope1 + scope2
would just be a concatenation of two arrays.
However, the post-refactor scope1.union(scope2)
can return quite an unexpected result.
Try the following:
Contract.none.union([]).to_sql # replace "Contract" with any ActiveRecord model you have
and you’d actually get
"contracts".* FROM ( (SELECT "contracts".* FROM "contracts" WHERE (1=0)) UNION (SELECT "contracts".* FROM "contracts") ) "contracts"
If you look at the part after UNION
, you’ll realise it’s just querying all contracts.
You can imagine how this could lead to a disastrous scenario, for instance, it could result in sending an invoice to ALL customers instead of those who’ve just purchased something; or hitting an external API for every existing contract instead of just a tiny subset of contracts and effectively DDoS-ing their system.
What happens under the hood?
When you do Contract.some_scope.union([])
, active_record_union converts the empty array into Contract.where([])
(*), so the statement is equivalent toContract.some_scope.union(Contract.where([]))
.
Now, active_record treats where
with a blank-ish argument as “no-op” (*) meaning Contract.where([])
would be equivalent toContract.all
.
And voilà, Contract.some_scope.union([])
is equivalent toContract.some_scope.union(Contract.all)
.
What should I do to avoid disaster?
Don’t use an empty array trying to emulate an ActiveRecord::Relation
that’s guaranteed to return an empty result. UseActiveRecord::QueryMethods#none
(e.g., Contract.none
) instead. The method scope2
should now look like this:
def scope2
return Contract.none if ... Contract.some_scope
end
Plus, writing tests that make sure things that aren’t supposed to happen aren’t happening, would help prevent inadvertently introducing a code that fetches and processes all records.