Scopes vs. Solr in a Rails Project

I recently worked on a large Rails application that handled a good deal of data — about 200 million entries in the database — and which imported about 100 million new entries each night.

The application needed to be able to search through all the entries, including attributes nested in multiple associations in order to return information useful to whoever was viewing the site. The problem with all this, of course, was that it takes a really long time to search through all that data.

In this post I’ll talk about the initial tricks I used to get the site up and running, and then I’ll talk about changing the site over to a more robust and dynamic system.

I started the initial iteration of the application by setting up a lot of scopes on the various models, and using limits in all my calls to the database.

The models were set up like this:

class ApplicationPrice < ActiveRecord::Base 
scope :from_last_import, lambda { 
where(export_date: last_import, storefront_id: cur_store)
}
scope :current_storefront, lambda { 
where(storefront_id: cur_store)
}
scope :free, lambda { 
where(retail_price: 0.0)
}
... 
end

So I could make calls like this:

ApplicationPrice.from_last_import.free.limit(25)

This worked well, at least for a relatively small limit like 25. However, to make the application really useful I needed to add something like pagination, or at least increase the limits I offered. Pagination didn’t work because simply running a `count` query could take too long, so I decided to just offer a larger limit via params if the user wanted one.

The problem with even doing that was out of 200 million entries it would be possible that only 1,000 matched a specific query, so searching for the first 200 entities could still cause the database to look at upwards of 40 million rows of data, or more, including associations.

To fix that problem I added indices to all important columns. That definitely should have been my first consideration, but eventually it became so obvious I couldn’t ignore it anyway. This solution worked well and is what is currently deployed. I talked to the end user about the limitations of the search and the system, but they were happy with the functionality and didn’t want to invest any more time into finding a solution.

Fast forward a while — I started looking at some other gigantic systems running a dynamic search and returning data in milliseconds, and I couldn’t help but wonder how they were doing it. Out of curiosity more than anything, I decided to dig back into this project and see if there was a better way to search the data.

One of the first things I ran into was the sunspot_rails gem. It basically gives you access so the super powerful Solr search engine within a Rails app. After poking around a little, I realized how amazing the Solr engine is. To test I created a searchable index on the title and description of applications, and it returned thousands of results in milliseconds.

I knew that I had to try plugging this into the existing app, and figured it probably won’t be much more work that switching scoped queries to Solr syntax. But, I was wrong.

The reason was that the Sunspot gem runs searches with criteria provided within a block — more or less similar to an ActiveRecord `where` search. In contract, in the previous scoped approach, searches were run as class methods.

Here is an examples of the two side by side:

Searches using scopes:

ApplicationPrice.from_last_import.free

And the new block-style searches using Sunspot.

ApplicationPrice.search { 
with(export_date: last_import, retail_price: 0.0)
}

In the end I’m sure the actual SQL statements are the same or at least very similar, but in Rails world it required me to change chained class methods to these search blocks.

It was a pain switching over, but in the end it helped me structure the application in a much more natural way, and allowed for a much more dynamic search. Because Solr indexes search attributes, I wasn’t limited to a simple scoping of free or not for performance reasons, I could now easily search for any prices without worry about query time.

Similarly, I could rely on more complex, but ultimately more natural associations (think asking the Application for its price, instead of asking an ApplicationPrice for its Application) to return my data, because all my important search attributes were indexed.

In the end I learned, again, for the thousandth time, to always look for a gem or system to help do something that seems easy but turns out to be hard (ahem, looking at you too pagination with search results), and use that to help guide your system design.


Originally published at tyleronrails.tumblr.com March 9th 2014.

Like what you read? Give Tyler Olson a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.