Distinctly Uncountable
Originally posted on the Agworld Developers Blog, hosted on Tumblr, on May 1, 2012.
We use the wonderful Scout service to keep track of the performance of our servers. Alert notifications get pushed into Lighthouse for triage, as we need to make sure that all performance bottlenecks are dealt with, even if that means just changing Scout’s alert thresholds to minimise false positives.
Today I started investigating this one…
The memory usage of one of our Passenger processes on staging spiked badly. We use Oink to annotate our rails log with information about memory usage and object allocations, and that allowed me to quickly track down the culprit.
… staging rails[8744]: Started GET “/widgets …
A quick look at the code for WidgetsController::index
shed no light on the issue. After all, it should only have been rendering a paginated list of widgets. That’s never been expensive before.
Object Allocations
More digging revealed that an admin user had made the request. Admins have access to all widgets in the system. Even so, retrieving the first 10 ActiveRecord objects for rendering should be cheap. What gives?
Oink was once again helpful:
Instantiation Breakdown: Total: 250099 | Widgets: 250000…
Erm… so we instantiated 250K ActiveRecord objects for Widget alone? When we should only have asked for 10? Examination of the log showed one query to retrieve the 10 widgets, followed by another to retrieve them all.
Complex Queries
A bit of debugging soon revealed the culprit: Kaminari, the “clean, powerful, customizable and sophisticated paginator for Rails 3”. It turns out that in late 2011 bug #160 had been closed by including a patch that uses “length” instead of “count” to calculate the number of records in the list being paginated.
I breathed a sigh of relief when I read that the patch only applied to “complex queries”. I then inhaled a gasp of despair when I looked at the code:
uses_distinct_sql_statement = c.to_sql =~ /DISTINCT/i
if uses_distinct_sql_statement
c.length
else
c = c.count
c.respond_to?(:count) ? c.count : c
end
That’s right, dear reader. A “complex query”, according to Kaminari, is one that contains the word “distinct” somewhere in the SQL. Bah!
Count vs. Length
As you should already be aware, count
will ask the database to do the heavy lifting for us. It’s nice and fast and cheap. On the other hand, length
will cause the query to be run as normal, building up an array of ActiveRecord objects, and then calculating the length of this array.
A middle ground not explored by the Kaminari patch would be to run the full query _without_ instantiating ActiveRecord objects (which you really should avoid doing in lots of cases, because it’s ecksy, which means “expensive” in Australian).
The performance of all three approaches can be measured with this code (which requires the use of REE, for GC statistics):
def measure
GC.clear_stats
time = Benchmark.realtime { puts “ANSWER = #{yield}” }
usage = GC.allocated_size / ( 1024.0 ** 2 )
puts “(allocated %.2fMB and took %.2fs)” % [usage, time]
endGC.enable_stats# Replace this with whichever query you want to measure
query = Widget.where(:sprocket_id => [13, 42, 137, 65538])measure { query.count }
measure do
ActiveRecord::Base.connection.select_values(query.to_sql).length
end
measure { query.length }
Loading that into the Rails Console gives the following results for us:
> load ‘~/profile_length.rb’
ANSWER = 35958
(allocated 0.02MB and took 0.06s)
ANSWER = 35958
(allocated 20.40MB and took 1.58s)
ANSWER = 35958
(allocated 164.51MB and took 7.64s)
That is, using .count
takes hardly any time and memory, the halfway house of executing the raw SQL is painfully slow, while what Kaminari is doing is downright alarming. All this just to render the paginator. Eek.
Delayed Discovery
If this patch went in six months ago, why are we only just discovering it now Well, we’d previously been running our own fork of Kaminari, and that was made prior to the patch landing. We’ve only very recently cleaned out these stale forks from our Gemfile.
And, although it may seem strange that we’re the first ones to notice this performance problem, it ain’t the first time it’s happened to us, as you’ll find out in future posts to this blog.
Wormy Rabbit Hole
Unfortunately, we seem to have opened up a can of worms with this one. While going down a rabbit hole. The patch was submitted to Kaminari as a work-around to a bug in ActiveRecord; very possibly bug #5554. Or bug #1003.
We’ve done some investigation with a virginal Rails install, which you can clone on GitHub. When you do that, run the tests. These are the ones that fail:
test “we can count distinct posts” do
query = Post.select(‘DISTINCT ON(number) *’)
count = query.count
length = query.length
assert count == length
endtest “we can sum distinct posts” do
query = Post.select(‘DISTINCT ON(number) *’)
sum = query.sum(:number)
reduce = query.reduce(0) { |s, p| s + p.number }
assert sum == reduce
end
We create 1k posts for testing, and each post has a number
that is generated via rand(1..10).to_i
. The query therefore only selects 10 posts. The query generated to do the select looks like this
SELECT DISTINCT ON(number) * FROM “posts”
Unfortunately, the query generated to do the count looks like this:
SELECT COUNT(*) FROM “posts”
And the query generated to do the sum looks like this:
SELECT SUM(“posts”.”number”) AS sum_id FROM “posts”
Wonky pagination is one thing, but I shudder to think that there may be Rails users out there generating mission-critical reports using distinct queries and sum.
This article was originally posted on the Agworld Developers Blog, hosted on Tumblr, on May 1, 2012, and is almost certainly out of date by now.