ActiveRecord #find_each considered harmful (sometimes)
26 February 2016
In the last couple of weeks we’ve been fighting the effects of increased number of users (TV campaign on national television YEAH).
The web part of the app did scale pretty well. What we didn’t consider thought is that innocuous queries that worked OK on table of a couple million rows, didn’t do so well on tables of 10M rows and growing.
First of all I’ve learned about using configuring a short statement_timeout in database.yml
That stopped slow queries from killing the whole site and surfaced all the areas that needed changing.
We moved several scheduled tasks to a follower database using the octopus gem, added a few composite indices, etc.
We also tried to speed up some of the tasks that looked unnecessarily slow.
The problem
During the optimisation process we’ve found out an interesting thing.
Using Postgres 9.4 the following query:
SELECT * FROM large_table WHERE created_at BETWEEN ? AND ? ORDER BY id LIMIT 10 OFFSET 0
is much slower than
SELECT * FROM large_table WHERE created_at BETWEEN ? AND ? ORDER BY created_at LIMIT 10 OFFSET 0
The first is what Rails runs when using #find_each or #find_in_batches.
The problem is particularly obvious when looping through a large number of rows.
The solution
We don’t have a very clean solution yet. For our specific problem we didn’t have to rely on ID ordering so we have creating our own find_in_batches solution that simply applies limit and offset repeatedly keeping the original order of the query.
We are really hoping it’s a problem that can be solved with some database indexing magic. Alternatively this problem should be documented in the Rails docs and potentially alternative find_in_batches algorithms should be provided.
Originally published at underthehood.carwow.co.uk on February 26, 2016.
Interested in making an Impact? Join the carwow-team!
Feeling social? Connect with us on Twitter and LinkedIn :-)