patternMinor
Deduplicate SELECT statements in relational division
Viewed 0 times
divisiondeduplicatestatementsrelationalselect
Problem
I have a query that does a lot of duplicate work:
As you can see, it performs different filtering on the params table 4 times. I'm using Redshift and although it scans this table very quickly, I have quite a few of these statements
The example uses
SELECT visitor_id, '1'::text AS filter
FROM events
WHERE id IN (SELECT event_id FROM params
WHERE key = 'utm_campaign' AND value = 'campaign_one')
AND id IN (SELECT event_id FROM params
WHERE key = 'utm_source' AND value = 'facebook')
GROUP BY visitor_id
UNION ALL
SELECT visitor_id, '2'::text AS filter
FROM events
WHERE id IN (SELECT event_id FROM params
WHERE key = 'utm_campaign' AND value = 'campaign_two')
AND id IN (SELECT event_id FROM params
WHERE key = 'utm_source' AND value = 'facebook')
GROUP BY visitor_idAs you can see, it performs different filtering on the params table 4 times. I'm using Redshift and although it scans this table very quickly, I have quite a few of these statements
UNIONed together. Is there a way to rewrite the SQL using CASE/IF statements?The example uses
key = 'utm_source' AND value = 'facebook' in both, but this is not necessarily true for all selects.Solution
I see a possible minor simplification with a CTE (available in Redshift) for the repeated subquery with identical predicates:
Plain joins may also be faster than multiple
This multicolumn index should allow index-only scans on
Add another index on just
Arsenal of available query techniques for relational division under this related question on SO:
Avoid multiple SELECTs
As commented by @Andriy, we can squeeze out some more:
In Postgres we could use a shorter, faster
For just two
2nd query without CTE
The CTE is not necessary for the second query. You can simplify to:
About CTEs
Resources for Common Table Expressions (as requested in the comment):
Data-modifying CTEs are particularly useful. Example:
Basics with an added advanced example on this related answer on SO:
WITH p2 AS (
SELECT event_id
FROM params
WHERE key = 'utm_source' AND value = 'facebook'
)
SELECT e.visitor_id, '1'::text AS filter
FROM p2
JOIN params p1 USING (event_id)
JOIN events e ON e.id = p2.event_id
WHERE p1.key = 'utm_campaign' AND p1.value = 'campaign_one'
GROUP BY e.visitor_id
UNION ALL
SELECT e.visitor_id, '2'::text AS filter
FROM p2
JOIN params p1 USING (event_id)
JOIN events e ON e.id = p2.event_id
WHERE p1.key = 'utm_campaign' AND p1.value = 'campaign_two'
GROUP BY e.visitor_id;Plain joins may also be faster than multiple
IN semi-joins.This multicolumn index should allow index-only scans on
params:CREATE INDEX foo_idx ON params (key, value, event_id)Add another index on just
(event_id) if you don't have one, yet.Arsenal of available query techniques for relational division under this related question on SO:
- How to filter SQL results in a has-many-through relation
Avoid multiple SELECTs
As commented by @Andriy, we can squeeze out some more:
WITH p2 AS ( -- repeated, immutable filter
SELECT event_id
FROM params
WHERE key = 'utm_source' AND value = 'facebook'
)
, p3 (value, filter) AS ( -- values for variable filter
SELECT text 'campaign_one', text '1'
UNION ALL SELECT 'campaign_two', '2'
)
SELECT e.visitor_id, p3.filter
FROM p3
JOIN params p1 USING (value)
JOIN p2 USING (event_id)
JOIN events e ON e.id = p2.event_id
WHERE p1.key = 'utm_campaign' -- repeated for p1
GROUP BY 1, 2;In Postgres we could use a shorter, faster
VALUES expression, but that feature is not currently supported in Redshift:...
, p3 (value, filter) AS (
VALUES
(text 'campaign_one', text '1')
, ( 'campaign_two', '2')
)
...For just two
SELECT in the UNION, this won't buy much. But it should be a substantial improvement for more - like you mentioned.2nd query without CTE
The CTE is not necessary for the second query. You can simplify to:
SELECT e.visitor_id, p3.filter
FROM (
SELECT text 'campaign_one' AS value, text '1' AS filter
UNION ALL SELECT 'campaign_two', '2'
) p3 -- values for variable filter
JOIN params p1 USING (value)
JOIN params p2 USING (event_id)
JOIN events e ON e.id = p2.event_id
WHERE p1.key = 'utm_campaign' -- repeated, immutable filters
AND p2.key = 'utm_source'
AND p2.value = 'facebook'
GROUP BY 1, 2;About CTEs
Resources for Common Table Expressions (as requested in the comment):
- The manual.
- Tag info for CTE.
Data-modifying CTEs are particularly useful. Example:
- How do I insert a row which contains a foreign key?
Basics with an added advanced example on this related answer on SO:
- Are SELECT type queries the only type that can be nested?
Code Snippets
WITH p2 AS (
SELECT event_id
FROM params
WHERE key = 'utm_source' AND value = 'facebook'
)
SELECT e.visitor_id, '1'::text AS filter
FROM p2
JOIN params p1 USING (event_id)
JOIN events e ON e.id = p2.event_id
WHERE p1.key = 'utm_campaign' AND p1.value = 'campaign_one'
GROUP BY e.visitor_id
UNION ALL
SELECT e.visitor_id, '2'::text AS filter
FROM p2
JOIN params p1 USING (event_id)
JOIN events e ON e.id = p2.event_id
WHERE p1.key = 'utm_campaign' AND p1.value = 'campaign_two'
GROUP BY e.visitor_id;CREATE INDEX foo_idx ON params (key, value, event_id)WITH p2 AS ( -- repeated, immutable filter
SELECT event_id
FROM params
WHERE key = 'utm_source' AND value = 'facebook'
)
, p3 (value, filter) AS ( -- values for variable filter
SELECT text 'campaign_one', text '1'
UNION ALL SELECT 'campaign_two', '2'
)
SELECT e.visitor_id, p3.filter
FROM p3
JOIN params p1 USING (value)
JOIN p2 USING (event_id)
JOIN events e ON e.id = p2.event_id
WHERE p1.key = 'utm_campaign' -- repeated for p1
GROUP BY 1, 2;...
, p3 (value, filter) AS (
VALUES
(text 'campaign_one', text '1')
, ( 'campaign_two', '2')
)
...SELECT e.visitor_id, p3.filter
FROM (
SELECT text 'campaign_one' AS value, text '1' AS filter
UNION ALL SELECT 'campaign_two', '2'
) p3 -- values for variable filter
JOIN params p1 USING (value)
JOIN params p2 USING (event_id)
JOIN events e ON e.id = p2.event_id
WHERE p1.key = 'utm_campaign' -- repeated, immutable filters
AND p2.key = 'utm_source'
AND p2.value = 'facebook'
GROUP BY 1, 2;Context
StackExchange Database Administrators Q#111345, answer score: 3
Revisions (0)
No revisions yet.