Postgresql: Caching with pg_cron and Materialized Views

We use Materialized views to cache the results of SQL queries. However, to refresh a materialized view, we need to run REFRESH MATERIALIZED VIEW {view_name}. Leveraging pg_cron, we can automate the refreshing of materialized views without the need for 3rd party tools.

All source code for this post is available in the Postgresql Cron Example repository.

Use Case and Requirements

We want to display the top 100 active users over the past 50 days who had the most opinions on content. The information can be out of date by up to up to 5 minutes.

Solution 1

We create a view as follows:

-- Find the top 100 most active users in the last 50 days.CREATE OR REPLACE VIEW example.opinion_activity_view AS
SELECT
uo.user_id,
uo.opinion_type_id,
st.opinion_type_display,
COUNT(uo.opinion_type_id) AS opinion_count
FROM example.user_opinion AS uo
INNER JOIN example.opinion_type AS st
ON st.opinion_type_id = uo.opinion_type_id
-- filter for only those users who had opinions in the last 50 days
WHERE uo.added_on >= NOW() - INTERVAL '50 days'
GROUP BY uo.user_id, uo.opinion_type_id, st.opinion_type_display
-- ordering by the opinion_count will give us the most active users
ORDER BY opinion_count DESC
-- Limit to the top 100 users
LIMIT 100;
-- Takes about 75ms to run with our test data
SELECT
oav.user_id,
oav.opinion_type_id,
oav.opinion_type_display,
oav.opinion_count
FROM example.opinion_activity_view AS oav;

75ms is not slow, but this solution is not future-proof. The execution time will increase because this query is affected by 1) the number of rows in our table and 2) how often users give an opinion on content.

Solution 2

We can place the query in a MATERIALIZED VIEW as follows:

-- Caching the results of our initial query.-- Takes about 75ms to create using our test data
CREATE MATERIALIZED VIEW IF NOT EXISTS example.opinion_activity AS
SELECT
oav.user_id,
oav.opinion_type_id,
oav.opinion_type_display,
oav.opinion_count
FROM example.opinion_activity_view AS oav;
-- Takes as low as 2ms to execute
SELECT
oa.user_id,
oa.opinion_type_id,
oa.opinion_type_display,
oa.opinion_count
FROM example.opinion_activity AS oa;

The initial creation of the MATERIALIZED VIEW takes about the same amount of time as it took to run the query. However, query time significantly reduces when we use the cached results by calling the materialized view 2ms. This solution is future-proof: our query time remains consistent no matter how many rows end up in our example.user_opinion table.

Refreshing a MATERIALIZED VIEW

Here is where things get a bit tricky. Any data changed in our table that is also in the MATERIALIZED VIEW requires a refresh. Refreshing is done by running REFRESH MATERIALIZED VIEW {view_name}. We can automate the refresh by using the pg_cron extension.

-- Schedule refreshing the opinion_activity materialized
-- view every 5 minutes
SELECT cron.schedule(
'opinion_activity',
'*/5 * * * *',
$CRON$ REFRESH MATERIALIZED VIEW example.opinion_activity; $CRON$
);

Note that we can easily unscheduled the pg_cron job:

-- Unschedule a pg_cron job
SELECT cron.unschedule('opinion_activity');

Logging

We can find out the status of our pg_cron jobs using the following queries:

-- See what jobs have been scheduled
SELECT *
FROM cron.job;
-- See details on execution of cron jobs
SELECT *
FROM cron.job_run_details;

and pg_cron also outputs the status of a job to the console:

# Example output of a successful pg_cron task
cron-example-db | 2063-04-05 19:28:00.005 UTC [71] LOG: cron job 1 starting:
REFRESH MATERIALIZED VIEW example.opinion_activity;
cron-example-db | 2063-04-05 19:28:00.138 UTC [71] LOG:
cron job 1 COMMAND completed: REFRESH MATERIALIZED VIEW
# Example output of an unsuccessful pg_cron task
cron-example-db | 2063-04-05 19:20:00.001 UTC [71]
LOG: cron job 1 starting: REFRESH MATERIALIZED VIEW example2.opinion_activity;
cron-example-db | 2063-04-05 19:20:00.009 UTC [9722]
ERROR: schema "example2" does not exist
cron-example-db | 2063-04-05 19:20:00.009 UTC [9722]
STATEMENT: REFRESH MATERIALIZED VIEW example2.opinion_activity;

Setting Up pg_cron

Note that setting up pg_cron takes some fiddling. Please check out the Postgresql Cron Example, which describes setting up pg_cron in a docker image and AWS RDS.

If we try using pg_cron without setting it up, we may see some of the following errors:

-- Have not installed extension
CREATE EXTENSION IF NOT EXISTS pg_cron;
-- ERROR: could not open extension control file "/usr/share/postgresql/14/extension/pg_cron.control": No such file or directory
-- Installed extension but have not setup a database name
CREATE EXTENSION IF NOT EXISTS pg_cron;

-- ERROR: unrecognized configuration parameter "cron.database_name"
-- CONTEXT: PL/pgSQL function inline_code_block line 3 at IF
-- Incorrect database name
CREATE EXTENSION IF NOT EXISTS pg_cron;

-- ERROR: can only create extension in database example2
-- DETAIL: Jobs must be scheduled from the database configured in cron.database_name, since the pg_cron background worker reads job descriptions from this database.
-- HINT: Add cron.database_name = 'example' in postgresql.conf to use the current database.
-- CONTEXT: PL/pgSQL function inline_code_block line 4 at RAISE
-- Everything setup correctly
CREATE EXTENSION IF NOT EXISTS pg_cron;
-- CREATE EXTENSION

Conclusion

Using pg_cron and Postgresql's MATERIALIZED VIEW, we can easily cache SQL queries without using any 3rd party utilities. Execution times of queries, even ones that are considered performant, are just a few milliseconds.

Not covered much in this post is when to apply this pattern: something we will touch on in another post.

[1] Photo by Scott Rodgerson on Unsplash.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Eric Hosick

Eric Hosick

74 Followers

Creator, entrepreneur, software architect, software engineer, lecturer, and technologist.