HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Postgres Row Level Security policy optimizes poorly compared to inline version

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
policypostgreslevelcomparedversionpoorlysecurityoptimizesinlinerow

Problem

I have a query which looks like this:

SELECT post.id, post.author_id, post.published_at, post.content
    FROM post
   WHERE post.group_id = 1
ORDER BY post.published_at DESC, post.id
   LIMIT 5;


This query has an index on (group_id, published_at DESC, id) which gives it this query plan when no Row Level Security (RLS) policies are used.

Limit  (cost=0.14..1.12 rows=5 width=143)
   ->  Index Scan using post_published_at on post  (cost=0.14..15.86 rows=80 width=143)
         Index Cond: (group_id = 1)


Then I add this policy:

CREATE POLICY select_member_of ON post FOR SELECT USING
  (EXISTS (SELECT 1
             FROM group_member
            WHERE group_member.account_id = current_setting('current_account_id', false)::INT AND
                  group_member.group_id = post.group_id));


There is a compound primary key on group_member.account_id and group_member.group_id on the group_member table.

I expect Postgres to plan this query as an index-only scan of group_member since both group_member.account_id and group_member.group_id will be set to constant values. group_member.group_id should be constant because of the WHERE post.group_id = 1 condition in the SELECT query above.

Indeed it looks like this is happening when I inline my RLS policy into the query like this:

SELECT id, author_id, published_at, content
    FROM post
   WHERE group_id = 1 AND
         (EXISTS (SELECT 1
                    FROM group_member
                   WHERE group_member.account_id = current_setting('current_account_id', false)::INT AND
                         group_member.group_id = post.group_id))
ORDER BY published_at DESC, id
   LIMIT 5;


I get the query plan:

```
Limit (cost=0.30..1.85 rows=5 width=143)
-> Nested Loop Semi Join (cost=0.30..25.04 rows=80 width=143)
-> Index Scan using post_published_at on post (cost=0.14..15.86 rows=80 width=147)
Index Cond: (group_id = 1)
-> Materia

Solution

I've been using RLS in production for a few years and my experience with it is that, despite being very useful, it's a bit of a leaky abstraction.

I believe this question as well as the most voted answer will help you understand why it's hard to make correct assumptions about query plans that involve tables with RLS, especially when JOINs are involved.

In short, RLS poses a problem for the query optimizer because certain optimizations could lead to data leakage, particularly when custom functions are used. So the optimizer takes a more defensive approach which sometimes results in worse query plans and poorer performance.

I haven't found a silver bullet solution for this problem, other than considering what's happening in a case-by-case situation and attempting to circumvent the problem.

I realize this is very generic comment and not the solution you're looking for, but I hope it still helps.

Context

StackExchange Database Administrators Q#232789, answer score: 2

Revisions (0)

No revisions yet.