snippetsqlMinor
How to Model / Quickly test for set membership in SQL?
Viewed 0 times
quicklymembershipsqlfortesthowmodelset
Problem
Each row in a table needs to be protected a list of application specific roles lets say something that looks like this example below
In this example the row with primary key 1 should only be visible to any user who has roles 1 or 2 or 5, row 2 should be visible to any user who has role 1 or 4 or 3 ... etc.
When a user logs into the application the application computes all roles that the user has for example I know that User 1 has has roles {1,11,7} and User 2 has roles {1,3,5}
My current implementation uses two tables like so.
A join is required to make a query on data that only returns the data for a specific user.
Is there a way to represent the roles in some way that I can store the list of roles that protects a row in a single colum such that I can do a select statement against that table that returns only the rows that the user can see. I want this to be as fast as possible.
I have already tried using bit strings but backed off because a bit string operation in the left hand side of the where clause is going to always cause a full table scan because an expression such as
Is there way to do what i want above with out using two tables that need a join or a bit operations that require a full table scan. I have control over the schema and the representation of the roles, I am open to clever encoding schemes if those help. I am using Postgres 9.2 and if wri
Desired Table Structure
pkey | data | roles
---------------------
1 | ... | {1,2,5}
2 | ... | {1,4,3}
3 | ... | {7,11,67,101}In this example the row with primary key 1 should only be visible to any user who has roles 1 or 2 or 5, row 2 should be visible to any user who has role 1 or 4 or 3 ... etc.
When a user logs into the application the application computes all roles that the user has for example I know that User 1 has has roles {1,11,7} and User 2 has roles {1,3,5}
My current implementation uses two tables like so.
Actual Table Structure
foo table
pkey | data
------------
1 | ...
2 | ...
3 | ...
roles table
foo_fk | role_id
------------------
1 | 1
1 | 2
1 | 5
2 | 1
2 | 4
2 | 3
3 | 7
3 | 11
3 | 67
3 | 101A join is required to make a query on data that only returns the data for a specific user.
Is there a way to represent the roles in some way that I can store the list of roles that protects a row in a single colum such that I can do a select statement against that table that returns only the rows that the user can see. I want this to be as fast as possible.
I have already tried using bit strings but backed off because a bit string operation in the left hand side of the where clause is going to always cause a full table scan because an expression such as
(column & seach_bit_mask) == some_value needs to evaluate the bit operation on every row of the table to return the result, thus requiring a full table scan. Is there way to do what i want above with out using two tables that need a join or a bit operations that require a full table scan. I have control over the schema and the representation of the roles, I am open to clever encoding schemes if those help. I am using Postgres 9.2 and if wri
Solution
The first thing I will say here is that you really need to encapsulate this properly to make it work. Secondly this is a good case where breaking 1NF is likely to be a good thing performance-wise.
I would break this into three pieces, totally mangling Codd's rules in the process, but I think those can be broken in certain cases and here breaking one piece demands breaking more pieces in order to correct the damage.
The first piece is the administration portion. In this portion, permissions are stored in the form you are currently using. This is used for updating and managing the permissions in a relational way. These would then be compiled for a specific row using a stored procedure.
The second piece is the storage piece. Here the rows are stored with the roles in an array as you suggest in your first answer. The arrays here are indexed using GIN which makes lookups a lot faster.
Third, you need encapsulation logic, such as views to only retrieve the rows a user has access to, or a stored procedure interface which checks the roles adding
To make this work you will need to compile the permissions when they are changed. However this structure allows you to say "all rows that Bob has access to can now be accessed by Joe."
Then you'd update the permissions by altering the data in foo_roles and then calling
Now one thing about this structure is that it makes lookups fast, but updates to the permissions pretty slow.
Update: A note about GIN indexes (per request).
Traditional database indexes are b-trees, which means you can search by comparing an expected value to branches of the index, and then traversing to leaves. This works well if searching either on the whole value (WHERE foo = 'my_fun_value') but it doesn't work well with more complex searches. GIN indexes (Generalized Index Method) allow for indexes to be used in a wider variety of searches, but they are slightly slower than b-tree indexes. GIN indexes can be used to ask questions like:
-
Is value x in array y? (our use case here)
-
Does circle A overlap with circle B?
-
Does line segment C cross line segment D?
-
Does appointment A overlap time-wise with appointment B?
In newer versions of PostgreSQL you can use these indexes to enforce constraints like "No appointments in any given room may overlap time-wise" but the main use case here is that it makes searching arrays to be fast.
In general a GIN index will give you much better performance than a b-tree or GIST index on a join. In this case your query against
I would break this into three pieces, totally mangling Codd's rules in the process, but I think those can be broken in certain cases and here breaking one piece demands breaking more pieces in order to correct the damage.
The first piece is the administration portion. In this portion, permissions are stored in the form you are currently using. This is used for updating and managing the permissions in a relational way. These would then be compiled for a specific row using a stored procedure.
The second piece is the storage piece. Here the rows are stored with the roles in an array as you suggest in your first answer. The arrays here are indexed using GIN which makes lookups a lot faster.
Third, you need encapsulation logic, such as views to only retrieve the rows a user has access to, or a stored procedure interface which checks the roles adding
WHERE my_role_id = ANY(roles) to the end of the query.To make this work you will need to compile the permissions when they are changed. However this structure allows you to say "all rows that Bob has access to can now be accessed by Joe."
CREATE TABLE foo (
id serial not null unique,
....
roles int[]
);
CREATE TABLE foo_roles (
foo_id int not null references foo(id),
role_id int not null references roles(id)
);
CREATE OR REPLACE FUNCTION rebuild_foo_roles(in_foo_id int) RETURNS int[]
LANGUAGE SQL AS $
UPDATE foo SET roles = (select array_agg(role_id) from foo_roles
where foo_id = $1)
WHERE role_id = $1
RETURNING roles;
$;Then you'd update the permissions by altering the data in foo_roles and then calling
rebuild_foo_roles with the ids of the rows you want to modify.Now one thing about this structure is that it makes lookups fast, but updates to the permissions pretty slow.
Update: A note about GIN indexes (per request).
Traditional database indexes are b-trees, which means you can search by comparing an expected value to branches of the index, and then traversing to leaves. This works well if searching either on the whole value (WHERE foo = 'my_fun_value') but it doesn't work well with more complex searches. GIN indexes (Generalized Index Method) allow for indexes to be used in a wider variety of searches, but they are slightly slower than b-tree indexes. GIN indexes can be used to ask questions like:
-
Is value x in array y? (our use case here)
-
Does circle A overlap with circle B?
-
Does line segment C cross line segment D?
-
Does appointment A overlap time-wise with appointment B?
In newer versions of PostgreSQL you can use these indexes to enforce constraints like "No appointments in any given room may overlap time-wise" but the main use case here is that it makes searching arrays to be fast.
In general a GIN index will give you much better performance than a b-tree or GIST index on a join. In this case your query against
foo can be an index scan against the relation with no joins needed.Code Snippets
CREATE TABLE foo (
id serial not null unique,
....
roles int[]
);
CREATE TABLE foo_roles (
foo_id int not null references foo(id),
role_id int not null references roles(id)
);
CREATE OR REPLACE FUNCTION rebuild_foo_roles(in_foo_id int) RETURNS int[]
LANGUAGE SQL AS $$
UPDATE foo SET roles = (select array_agg(role_id) from foo_roles
where foo_id = $1)
WHERE role_id = $1
RETURNING roles;
$$;Context
StackExchange Database Administrators Q#37109, answer score: 5
Revisions (0)
No revisions yet.