The Benefits of Materialized Views (and how to use them in your Ruby on Rails project)
When I first heard about materialized views, my mind was kind of blown. “So, I can write a SQL query and the results can be stored as its own table (and live on disk?)”. If that puppy lives in-memory, it’s GOT TO BE faster then having to dig into the deep, dark depths of the database. Right?
Yeah, it’s true. Grabbing data from disk, just sitting there, ready for you, is pretty fast. But when is it right to use a materialized view?
From my perspective, if you have a fairly large set of data that you often will need access to but doesn’t often change, it may be a good place for a mat view.
Here’s one of the more simple, real-life use cases where I implemented a materialized view to increase performance.
Our app serves up songs, all over the place. The songs we serve up to our users need to meet a certain criteria: they aren’t archived, they have a valid audio file, and a few others, as you will see below. Here’s what we used to do:
scope :ready, where("song_versions.state='active' AND song_versions.transcoding_meta -> 'state'='finished' AND artists.state='active'").includes(song: :artist)
All over the place in our code, you’d see
SongVersion.ready (often with other conditions chained on, etc).
This isn’t horrible, but being it gets called so much, and it is very important, this is a prime candidate for optimization.
So, let me show you how to begin by creating the materialized view. This will take some knowledge of writing good ol’ raw sql. Luckily, this one is a pretty simple
select query. We will create this in a
Right on. We now have a materialized view. This migration actually ran the SQL query, and because of this line,
CREATE MATERIALIZED VIEW, it now is stored on disk. As you see from our
select statement, this will return nothing but a table of
song_version_id ‘s, and, for us, over 50k of them.
Cool, but what if some data changes? What if, say, a song_version’s
state attribute changes and is no longer eligible be a “ready song version”? Oh me oh my I’m so glad you asked. For this, we can create a function. In our case, a PostgreSQL function, also known as “Stored Procedure”. Note, this lives in the same rails migration as the mat view creation.
The key here is the line
REFRESH MATERIALIZED VIEW ready_song_versions;. Whenever that function is called, it will re-run the SQL query that is stored in the mat view, with the new data results, and continue to live on disk as expected.
So now we have a stored procedure, but when to call it? Whenever any of the conditions change as seen in the
where statement in the mat view query.
So we create a
trigger. A trigger is a special type of stored procedure that automatically executes when an event occurs in the database server.
I created triggers for every column mentioned in the
where but I will show just one for brevity. They can all be modeled after this:
So this says, if any song_version has an update, delete, or truncate (which is also essentially a ‘delete’) in the database of its
state then run the mat view refresher we just created.
So this in your application would now trigger a refresh:
song_version = SongVersion.last
The mat view refreshes (and VERY quickly!), with the
idof this song_version no longer present in the table.
Ok, so how to really use this in your Rails project? Let’s give ActiveRecord the knowledge of this new mat view (and its trigger to refresh it).
You create a model that inherits from ActiveRecord. Setting the
table_name is important (I’ll show you in a minute). Set it to
readonly so it can’t be written to from ActiveRecord, and create a class method
refresh so you have the ability to call a refresh from Rails, if needed.
Here’s where we use the
table_name (in models/song_version.rb):
Now when we run
SongVersion.ready we get the same results as before, but much quicker, as we have mostly circumvented the hard work ActiveRecord would have to do to find the data that meets the conditions we have set for “ready” song versions.
Let’s look at the performance gains!
Here’s the output from the terminal:
We’ve gone from 7.8 seconds to 1.7 seconds. That’s a significant gain!
We (1) create a materialized view, (2) create a function to refresh it, (3) create triggers to call the refresher, (4) set up a model so ActiveRecord knows what to do with this new mat view, and (5) declare the relationship and create a scope in the related model (SongVersion).
A few things to note:
- The structure of materialized views cannot be modified once they have already been created. If you, say, have a new condition to add into the
whereclause, you’ll have to create a new migration to first drop the mat view, then recreate with your updated query.
- A gotchya could be how mat views effect your tests. That’s why there is that
refreshmethod we created on the
ReadySongVersionmodel. We explictly call
spec_helperfile, in a
- If you are dropping a mat view to recreate it in another migration, that will drop all its indexes and triggers as well. You’ll need to add back in this triggers (and indexes if you have any) in your new migration file.
- Before using a materialized view, weigh your other options. Ask yourself, “should this be cached? can this just be indexed?”
- Not every database type supports mat views. We use Postgres. Check first to see if it will work with your chosen DB.
- Always test your SQL queries that you wish to use in your fave SQL GUI tool (I use PSequel) before you commit to putting it in your migration. make sure it returns what you expect.
I’m happy to answer any questions about this stuff! HMU in the comments.