patternsqlMinor
Optimize Row Level Security expression in Postgres
Viewed 0 times
expressionpostgreslevelsecurityoptimizerow
Problem
I have a table which contains millions of rows, categorized into a few categories (lets say numbers 1-5). An application that accesses the database uses one single database account. However, the application has its own user accounts and each app user is allowed to access only certain categories. Therefore the list of allowed categories is passed to the database by a session variable:
I use RLS to filter the rows according to the session variable:
The problem is that with this approach the RLS row filtering takes some significant amount of time.
On the other hand, when I experimented by doing:
the RLS filtering was almost 10 times faster. Of course, such hardcoding is a no go, since I want to change the list of allowed categories dynamically within the app.
The
So my question is - is there a way to optimize the RLS expression so it gets at least closer to the hardcoded approach? Would you suggest a different solution to the problem? The application has a lot of users, so I don't want to create a database account for every single one.
SET SESSION mydb.allowed_categories = '1,3,5';I use RLS to filter the rows according to the session variable:
CREATE POLICY table_select_policy ON big_table
FOR SELECT
USING (ARRAY[category] && string_to_array(current_setting('mydb.allowed_categories'),',')::int[]));The problem is that with this approach the RLS row filtering takes some significant amount of time.
On the other hand, when I experimented by doing:
CREATE POLICY table_select_policy ON big_table
FOR SELECT
USING (category = 1 OR category = 3 OR category = 5);the RLS filtering was almost 10 times faster. Of course, such hardcoding is a no go, since I want to change the list of allowed categories dynamically within the app.
The
category column has a btree index, however since the number of categories is rather small, the query planner always favors the sequential scan for the RLS filter.So my question is - is there a way to optimize the RLS expression so it gets at least closer to the hardcoded approach? Would you suggest a different solution to the problem? The application has a lot of users, so I don't want to create a database account for every single one.
Solution
At first glance, it might appear that since
However, the real problem has nothing to do with the form of the condition. If you take a look at the PostgreSQL system catalog and find those two functions that were used in the condition, you may notice that both functions have their
In order to illustrate this, consider the
Querying the table with the the query below results in the following plan (look at the number of rows removed by Index Recheck).
To avoid that, you have two options. The first is to increase the
The second option is to wrap the query condition in a "costly" function.
This option will also make the optimizer end up performing Index Scan instead of Bitmap Scan which may turn out to be a bit slower.
current_setting and string_to_array functions are stable and immutable respectively and there is an index on the category column, the following condition could do the trick.CREATE POLICY table_select_policy ON big_table
FOR SELECT
USING (category = ANY(string_to_array(current_setting('mydb.allowed_categories'),',')::int[])));However, the real problem has nothing to do with the form of the condition. If you take a look at the PostgreSQL system catalog and find those two functions that were used in the condition, you may notice that both functions have their
cost parameter set to 1. This makes the optimizer assume that calling those functions to recheck the condition while doing Bitmap Heap Scan is cheap.In order to illustrate this, consider the
big_table table shown below.CREATE TABLE big_table AS
SELECT c AS category
FROM generate_series(1, 1000000) g,
generate_series(1, 10) c;
CREATE INDEX big_table_category_idx ON big_table (category);
ANALYZE big_table;Querying the table with the the query below results in the following plan (look at the number of rows removed by Index Recheck).
EXPLAIN ANALYZE SELECT * FROM big_table WHERE category = ANY(string_to_array(current_setting('mydb.allowed_categories'), ',')::int[]);
"Bitmap Heap Scan on big_table (cost=52478.56..195418.17 rows=3036665 width=4) (actual time=166.613..9273.010 rows=3000000 loops=1)"
" Recheck Cond: (category = ANY ((string_to_array(current_setting('mydb.allowed_categories'::text), ','::text))::integer[]))"
" Rows Removed by Index Recheck: 5209365"
" -> Bitmap Index Scan on big_table_category_idx (cost=0.00..51719.39 rows=3036665 width=0) (actual time=164.782..164.782 rows=3000000 loops=1)"
" Index Cond: (category = ANY ((string_to_array(current_setting('mydb.allowed_categories'::text), ','::text))::integer[]))"
"Total runtime: 9341.568 ms"To avoid that, you have two options. The first is to increase the
work_mem parameter in your server config. This will allow the server to store the complete bitmap in memory and save a great deal of time while rechecking the condition. The plan below was obtained when the work_mem parameter was set to 1000M.EXPLAIN ANALYZE SELECT * FROM big_table WHERE category = ANY(string_to_array(current_setting('mydb.allowed_categories'), ',')::int[]);
"Bitmap Heap Scan on big_table (cost=52478.56..195418.17 rows=3036665 width=4) (actual time=193.613..449.385 rows=3000000 loops=1)"
" Recheck Cond: (category = ANY ((string_to_array(current_setting('mydb.allowed_categories'::text), ','::text))::integer[]))"
" -> Bitmap Index Scan on big_table_category_idx (cost=0.00..51719.39 rows=3036665 width=0) (actual time=184.858..184.858 rows=3000000 loops=1)"
" Index Cond: (category = ANY ((string_to_array(current_setting('mydb.allowed_categories'::text), ','::text))::integer[]))"
"Total runtime: 513.528 ms"The second option is to wrap the query condition in a "costly" function.
CREATE OR REPLACE FUNCTION my_categories()
RETURNS int[] AS $
BEGIN
RETURN string_to_array(current_setting('mydb.allowed_categories'), ',')::int[];
END;
$ LANGUAGE plpgsql STABLE COST 100000;
EXPLAIN ANALYZE SELECT * FROM big_table WHERE category = ANY(my_categories());
"Index Only Scan using big_table_category_idx on big_table (cost=250.44..9363945.19 rows=3036665 width=4) (actual time=0.035..577.094 rows=3000000 loops=1)"
" Index Cond: (category = ANY (my_categories()))"
" Heap Fetches: 3000000"
"Total runtime: 642.522 ms"This option will also make the optimizer end up performing Index Scan instead of Bitmap Scan which may turn out to be a bit slower.
Code Snippets
CREATE POLICY table_select_policy ON big_table
FOR SELECT
USING (category = ANY(string_to_array(current_setting('mydb.allowed_categories'),',')::int[])));CREATE TABLE big_table AS
SELECT c AS category
FROM generate_series(1, 1000000) g,
generate_series(1, 10) c;
CREATE INDEX big_table_category_idx ON big_table (category);
ANALYZE big_table;EXPLAIN ANALYZE SELECT * FROM big_table WHERE category = ANY(string_to_array(current_setting('mydb.allowed_categories'), ',')::int[]);
"Bitmap Heap Scan on big_table (cost=52478.56..195418.17 rows=3036665 width=4) (actual time=166.613..9273.010 rows=3000000 loops=1)"
" Recheck Cond: (category = ANY ((string_to_array(current_setting('mydb.allowed_categories'::text), ','::text))::integer[]))"
" Rows Removed by Index Recheck: 5209365"
" -> Bitmap Index Scan on big_table_category_idx (cost=0.00..51719.39 rows=3036665 width=0) (actual time=164.782..164.782 rows=3000000 loops=1)"
" Index Cond: (category = ANY ((string_to_array(current_setting('mydb.allowed_categories'::text), ','::text))::integer[]))"
"Total runtime: 9341.568 ms"EXPLAIN ANALYZE SELECT * FROM big_table WHERE category = ANY(string_to_array(current_setting('mydb.allowed_categories'), ',')::int[]);
"Bitmap Heap Scan on big_table (cost=52478.56..195418.17 rows=3036665 width=4) (actual time=193.613..449.385 rows=3000000 loops=1)"
" Recheck Cond: (category = ANY ((string_to_array(current_setting('mydb.allowed_categories'::text), ','::text))::integer[]))"
" -> Bitmap Index Scan on big_table_category_idx (cost=0.00..51719.39 rows=3036665 width=0) (actual time=184.858..184.858 rows=3000000 loops=1)"
" Index Cond: (category = ANY ((string_to_array(current_setting('mydb.allowed_categories'::text), ','::text))::integer[]))"
"Total runtime: 513.528 ms"CREATE OR REPLACE FUNCTION my_categories()
RETURNS int[] AS $$
BEGIN
RETURN string_to_array(current_setting('mydb.allowed_categories'), ',')::int[];
END;
$$ LANGUAGE plpgsql STABLE COST 100000;
EXPLAIN ANALYZE SELECT * FROM big_table WHERE category = ANY(my_categories());
"Index Only Scan using big_table_category_idx on big_table (cost=250.44..9363945.19 rows=3036665 width=4) (actual time=0.035..577.094 rows=3000000 loops=1)"
" Index Cond: (category = ANY (my_categories()))"
" Heap Fetches: 3000000"
"Total runtime: 642.522 ms"Context
StackExchange Database Administrators Q#129869, answer score: 7
Revisions (0)
No revisions yet.