Using PostgreSQL row level security (RLS) to authorize READ queries for your application’s users.
Row Level Security was introduced in v9.5 of PostgreSQL. It filters query results based on user defined row policies. You can read more here . Some key points are that RLS is set up per table using 2 sets of rules — one for CREATE
/ UPDATE
/ DELETE
and another ruleset for READ
.
Cool. But how does RLS impact performance, if at all?
The scenario
This will be a comparison using RLS to filter the rows a user can read vs. plain ol’ Table JOIN
s with a simple privilege table. I hope to answer the question of whether RLS is suitable as the primary privilege system in an app with thousands of users.
Here are the tables we’ll be working with:
page_photo
has ~ 100,000 rows. (~100 photos * 1,000 pages)
page
has 1,000 rows. (~ 50% public)
page_user
has 10,000 rows where each user gets 10 pages. (10 pages * 1,000 users)
RLS does not apply to a sudo user by default so we need to create a ROLE
with some basic privileges:
Case 1: page WHERE public = TRUE
Without RLS:
With RLS:
RLS loses this round.
When relying on RLS row filtering the query is 1.7x slower and when including WHERE public = true
with and without RLS appear equivalent (~8.5ms).
CASE 2: page WHERE public = TRUE
with partial INDEX
Let’s redo, but with a partial INDEX on public = true
which is the USING
condition for the RLS policy.
Without RLS:
With RLS:
Using a partial INDEX on public IS true
speeds the RLS query up to match its non-RLS equivalent. Nice!
Case 3: JOIN vs RLS Privileges
Let’s do something more interesting and grab all the page_photo
s that a user has privileges to see.
i.e. we’ll query for all the page_photo
s that belong to a page
where the user has a corresponding entry in the page_user
table.
Without RLS:
Not great, but let’s try that with an index:
10x faster. cool! I wonder if we can do better…
22x faster than original! Now we’re cooking with gas.
With RLS:
Now this is where RLS gets interesting because our READ
rule needs to “simulate” the same access privileges that the table JOIN
did above ^.
There are a couple of ways I can think to implement RLS in this situation:
A) create a function with this API: can_access_page(page_id, user_id)
Applying this function to the USING
expression for an RLS policy on page_photo
table means you’re passing in page_photo.page_id
and probably some LOCAL
/ SESSION
variable, perhaps from a SET app.current_user = 1
.
Bear in mind that RLS filters ALLL Teh Thingz (rows) — so if USING
or WITH CHECK
expressions execute a function that triggers a db lookup, then lots of queries may be firing solely to dwindle the universe of rows down before even running the main query (please correct me if this is inaccurate).
or…
B) query once to get all page_ids
that the user has access to:
SELECT page_id from page_user WHERE user_id = 1;
and then, armed with this information, create a Function with an API like this: bigint_contains(page_ids, page_id)
which, in pseudo code may be page_ids.contains(page_photo.page_id)
Method A seems easier (but less performant). Let’s try that for now. The function:
And the RLS policy:
Keep in mind that if app.user_id
does not exist, the function will heroically fail and throw an error.
And here’s how one might implement those pieces using a transaction:
Wow. Not impressive you might say, but let’s consider it like this:
It takes about 0.7ms to execute can_access_page
and there are 100,000 page_photo
rows. So if we were in 3rd grade we might conclude that filtering should take 100,000*0.7ms = 70,000ms or 70 seconds. But it does not take 70 seconds — it only takes 1.6 seconds. So we’re clearly experiencing optimizations here. Even though the total elapsed time is unimpressive I would not be quick to call this an RLS fail though; more likely it’s just not an ideal use-case for RLS — especially considering the non-RLS alternative is significantly faster at < 2ms.
Let’s try method B which does not involve doing a db lookup while enforcing the RLS Policy.
And to get an Array of page_ids the user can access:
There is one problem with this approach… I was unsuccessful in applying the results of this query to app.user_page_ids
.
Apparently SET app.user_page_ids = ”some query”
does not work. I even tried stuffing the query into a function and setting it that way — no luck. If you have the solution speak up! For now we’ll just set it manually:
And the RLS policy:
All together:
That’s 2x faster than method A but still a longshot from the non-RLS alternative.
Conclusion
update (June 2019): Take this conclusion 👇 with a grain of salt and check out Ethan Resnick’s comment for ideas on getting RLS performing as good as the non-RLS equivalents.
— -
I would not use RLS as the main privilege system for db lookups when the number of rows that are likely to be filtered out, for any given query, is a substantial amount of the total rows in that table.
This opinion does not extend to using a RLS Policy’s WITH CHECK
expression, which only operates on the row being inserted/deleted/updated. In which case I expect much better performance since we’re dealing with 1 row at a time. I’ll have to dig into that use-case specifically though.
I hope it does not turn you off to RLS and that you explore its implementation for yourself — and to share what you’ve learned. There isn’t a large amount of information (yet) when it comes to RLS best practices or good use-cases — but please leave links in the comments if you come across a good resource!
For those interested. I’ve included the EXPLAIN ANALYZE
for some of the queries: