Rails Models With Materialized Views

Rails + Postgres FTW

If you ever worked on larger Rails codebases, you probably faced situations where it was needed to go beyond the abstractions provided by ActiveRecord and use more complex operations of your RDBMS.

Consider the following scenario: we need to implement a report in an already existing bike rental system. This report must show the bicycle models sorted by number of rentals, and also the total revenue generated by each model.

Disregarding the most complex logic, like availability and quantity of bikes, our database looks as following:

Generated with rails-erd gem

We want an endpoint which gives us back information in a particular format. The way in which this data is presented is unimportant for the purpose of this tutorial; it could be an HTML, or a CSV, for example. Let’s create some tests to guide us in order to achieve the desired format on any strategy we develop.

First try: Heavy lifting on Ruby

The most naive method of building the hashes array we want is to initialize all the objects and iterate over each one.

This is not necessarily a bad approach. It works, but does not perform quite well on a larger amount of data. The following benchmark, as well as the next ones, uses 100,000 records to see how fast the report is generated.

It takes 10 seconds to retrieve the data. And remember that this is just to get a formatted array. In a real life scenario, we would have to account for the time needed for the request to go through all the stack, from routing to rendering the views. 10 seconds is a lot of time, specially if you have lots of requests coming.

Of course, in a situation of listing the results on a page, you could get around this issue by using pagination. However, this wouldn’t work on a XLSX download, for instance.

Second try: ActiveRecord + SQL

In this approach we try to solve the problem using ActiveRecord abstractions, and, where we can’t, raw SQL.

This class produces the same results as the previous solution, thus passing the same tests. Here’s a benchmark comparing the two approaches:

The report ran in 0.13 seconds! 77 times faster than our first method. However, if in the context in which this system is developed, instant download is more important than the report being 100% up to date (if it only needs to be refreshed once a day, for instance), we can have a more elegant and faster solution.

Working with Materialized Views

A database view is basically a set of query results which can be treated as a table. Every time a view is called, the queries are executed and the results can be used as a real table for other operations. Views can be used, for example, as a security measure; if we abstract some tables which contain sensitive content, we can then grant privileges for some users to access only the views, but not the underlying tables.

A materialized view is a kind of view whose results are actually stored in the disk. Unlike a typical view, the results are not updated every time the view is called. And unlike a table, you can’t update single records; only refresh the entire view. Think of it as a table which is a snapshot of a query. To learn more about views, check this out.

Creating a view

To create a view, just generate a migration with the view SQL.

There are some issues with this method. One of them is that you will write a large amount of SQL without proper syntax highlighting, which can be prone to errors. Another problem is that if (when) you need to update this view, you’ll need to copy the up method of this version in the down method of the next version, in order to have a reversible migration. To solve this and some other issues, people at Thoughtbot created the scenic gem.

Using Scenic

After the initial setup adding the gem to the Gemfile, you’ll have some generators and new methods available at your migrations. To generate the migration along with the first version of the view, run the following command:

$ rails g scenic:view bikes_reports --materialized
create db/views
create db/views/bikes_reports_v01.sql
create db/migrate/20160801212120_create_bikes_reports.rb

We now have two new files: the migration, which uses scenic’s DSL to load up the SQL; and the .sql, where we will write the view definition. Notice that the sql file has the version in the file name. This is used by scenic to update or revert to some specific version. For more detailed explanation on how this gem works, take a look at the repository.

This is how our setup looks now:

Now that we have a bikes_reports materialized view, it’s just a matter of creating a BikesReport model, and everything should work as if we were using an actual table. This happens because ActiveRecord does does not know it is not a table, since Postgres itself treats matviews and tables the same way.

All looks well, but there’s a few catches in this approach:

  • If we call #save in one of the instances of this model, we’ll have a database level error. By defining readonly? on the model, we still get an error, but this time a more specific error, stating that the model is read only;
  • We need a convenient way to refresh our view (you probably want to do this on a worker or Rake task)

This is how our model will look like:

If you think this is a lot of boilerplate, don’t worry. Scenic provides generators for models with those methods defined.

And finally, let’s use our new approach.

Now we can compare this approach with the previous ones to see how much we improved our performance. Note that the view was refreshed before running the benchmark.

As expected, the materialized view approach is even faster, being 11 times faster than the AR + SQL approach, and 924 times (!!!) faster than the naive approach.

That’s it folks. I hope this exercise has been valuable and has provided you with the means to improve your app performance should you find yourself in a similar situation. The example used is available at https://github.com/jaimerson/rent_a_bike.

References