patternsqlMinor
SELECT rows based on indefinite number of filters
Viewed 0 times
rowsnumberfiltersbasedindefiniteselect
Problem
How do I create a function which takes indefinite parameters? And then finds all
Example
Table:
Sample data:
Example request: Let's say I want all
A hardcoded solution for that exact scenario only might look like this:
Except I want an indefinite number of tag matches. Can I create a function that takes an arbitrary number of tag names/values and returns a set of
game_ids in a table where each parameter matches a different row (but same game_id)?Example
Table:
create table tags (
tag_id serial primary key,
game_id int, -- references games(game_id),
tag_name text,
tag_value text
)Sample data:
tag_id | game_id | tag_name | tag_value
--------+---------+-----------+----------------------
55 | 6 | Event | EUR-ASIA Rapid Match
58 | 6 | Round | 5
400 | 38 | Event | EUR-ASIA Rapid Match
403 | 38 | Round | 4Example request: Let's say I want all
game_ids whereEvent (a tag_name) = 'EUR-ASIA Rapid Match' (a tag_value)
AND
Round (a tag_name) = '5' (a tag_value)A hardcoded solution for that exact scenario only might look like this:
with m1 as (
select game_id from tags
where tag_name = 'Event'
and tag_value = 'EUR-ASIA Rapid Match'
), m2 as (
select game_id from tags
where tag_name = 'Round'
and tag_value = '5'
) select * from m1 intersect select * from m2;Except I want an indefinite number of tag matches. Can I create a function that takes an arbitrary number of tag names/values and returns a set of
game_id matching all? The call might look this (pseudo-code):select * from get_games_by_tags('{Event,EUR-ASIA Rapid Match}', ...)Solution
This is a special case of relational-division. Here is an arsenal of query techniques:
The special difficulty of your case is to filter on the combination of two attributes, but the principle is the same.
You can make this fully dynamic with plain SQL, without string concatenation and dynamic SQL:
But performance won't come close to the following solution with dynamic SQL.
For best performance, have this (
Maybe your
The basic query technique I chose uses the pattern:
This query is already optimized for performance.
Function
Using a user-defined row type as input like you have in your answer (optional, but convenient for the function design). I chose the name
Note the subtle differences in syntax for these two row values:
The first one is a string literal for the registered row type
Either works for our purpose and gets index support. Just don't confuse the different syntax requirements. Related:
The shortcuts for just 1 or 2 parameters are optional but should further improve performance.
db<>fiddle here
Should be faster by orders of magnitude than what you have in your answer.
Call:
You can also pass an actual array to a
- How to filter SQL results in a has-many-through relation
The special difficulty of your case is to filter on the combination of two attributes, but the principle is the same.
You can make this fully dynamic with plain SQL, without string concatenation and dynamic SQL:
- Using same column multiple times in WHERE clause
But performance won't come close to the following solution with dynamic SQL.
For best performance, have this (
UNIQUE) multicolumn index:CREATE [UNIQUE] INDEX ON tags (tag_name, tag_value, game_id);Maybe your
PRIMARY KEY on tags already spans these columns. For best performance you need index columns in the demonstrated order. Create an additional index if the PK does not match or change the column order of the PK unless you need columns in a different order (too). Related:- Is a composite index also good for queries on the first field?
- PostgreSQL composite primary key
The basic query technique I chose uses the pattern:
SELECT game_id
FROM tags t
WHERE (tag_name, tag_value) = ('Event', 'EUR-ASIA Rapid Match')
AND EXISTS (SELECT FROM tags WHERE game_id = t.game_id AND (tag_name, tag_value) = ('Round', '5'))
AND EXISTS (SELECT FROM tags WHERE game_id = t.game_id AND (tag_name, tag_value) = ('some_tag', 'some value'))
AND ...This query is already optimized for performance.
Function
Using a user-defined row type as input like you have in your answer (optional, but convenient for the function design). I chose the name
game_tag because tag felt too generic:CREATE TYPE game_tag AS(
tag_name text
, tag_value text
);Note the subtle differences in syntax for these two row values:
'(Event,"EUR-ASIA Rapid Match")'::game_tag
('Event', 'EUR-ASIA Rapid Match')The first one is a string literal for the registered row type
game_tag, the second is a ROW constructor on two string literals building an anonymous row, short for:ROW('Event', 'EUR-ASIA Rapid Match')Either works for our purpose and gets index support. Just don't confuse the different syntax requirements. Related:
- Invalid input syntax for type numeric: “(0.0000000000000000,8)”
The shortcuts for just 1 or 2 parameters are optional but should further improve performance.
CREATE OR REPLACE FUNCTION f_games_by_tags(VARIADIC _filters game_tag[])
RETURNS table (game_id int) AS
$func$
BEGIN
CASE cardinality(_filters)
-- WHEN 0 THEN -- impossible
WHEN 1 THEN
RETURN QUERY
SELECT t.game_id
FROM tags t
WHERE (tag_name, tag_value) = _filters[1];
WHEN 2 THEN
RETURN QUERY
SELECT t.game_id
FROM tags t
WHERE (tag_name, tag_value) = _filters[1]
AND EXISTS (
SELECT FROM tags t1
WHERE t1.game_id = t.game_id
AND (tag_name, tag_value) = _filters[2]
);
ELSE
RETURN QUERY EXECUTE
(SELECT 'SELECT game_id FROM tags t WHERE (tag_name, tag_value) = $1[1] AND '
|| string_agg('EXISTS (SELECT FROM tags WHERE game_id = t.game_id AND (tag_name, tag_value) = $1[' || g || '])', ' AND ')
FROM generate_series (2, cardinality(_filters)) g)
USING _filters;
END CASE;
END
$func$ LANGUAGE plpgsql;db<>fiddle here
Should be faster by orders of magnitude than what you have in your answer.
Call:
SELECT * FROM f_games_by_tags('(Event,"EUR-ASIA Rapid Match")');
SELECT * FROM f_games_by_tags('(Round,5)', '(Event,"EUR-ASIA Rapid Match")', '(some_tag,"some value")');You can also pass an actual array to a
VARIADIC function. Related:- How to use an array as argument to a VARIADIC function in PostgreSQL?
- Use array of composite type as function parameter and access it
Code Snippets
CREATE [UNIQUE] INDEX ON tags (tag_name, tag_value, game_id);SELECT game_id
FROM tags t
WHERE (tag_name, tag_value) = ('Event', 'EUR-ASIA Rapid Match')
AND EXISTS (SELECT FROM tags WHERE game_id = t.game_id AND (tag_name, tag_value) = ('Round', '5'))
AND EXISTS (SELECT FROM tags WHERE game_id = t.game_id AND (tag_name, tag_value) = ('some_tag', 'some value'))
AND ...CREATE TYPE game_tag AS(
tag_name text
, tag_value text
);'(Event,"EUR-ASIA Rapid Match")'::game_tag
('Event', 'EUR-ASIA Rapid Match')ROW('Event', 'EUR-ASIA Rapid Match')Context
StackExchange Database Administrators Q#205945, answer score: 4
Revisions (0)
No revisions yet.