patternsqlMinor
Is there a faster way to get UNION ALL behavior in Postgres?
Viewed 0 times
postgresallunionbehaviorwayfastergetthere
Problem
I have a table (t0), in my Postgres DB, with data that looks something like this:
And I have a query to return my desired results of:
My query looks something like this:
Is there a faster way to do an operation like this? It's not too bad, but I'm doing it in several places (with joins) and all these repeated queries seems to slow stuff down a bit. Seems like there must be a better way.
Here's the query in fiddle form: http://sqlfiddle.com/#!15/d41d8/3603
t1_id t2_id
1 1
2 1
2 1
4 null
4 null
5 nullAnd I have a query to return my desired results of:
t1_id t2_id
1 1
4 null
5 nullMy query looks something like this:
(
SELECT DISTINCT ON (t2_id) t1_id, t2_id
FROM t0
WHERE t2_id IS NOT NULL
)
UNION ALL
(
SELECT DISTINCT ON (t1_id) t1_id, t2_id
FROM t0
WHERE t2_id IS NULL
)Is there a faster way to do an operation like this? It's not too bad, but I'm doing it in several places (with joins) and all these repeated queries seems to slow stuff down a bit. Seems like there must be a better way.
Here's the query in fiddle form: http://sqlfiddle.com/#!15/d41d8/3603
Solution
For the simple case, I can only think of minor improvements to the query:
-
As ypercube mentioned You need to add
-
You can use the constant
-
The key to performance is indexing. Try two partial indexes matching the two parts of the query:
You may or may not want to include additional columns to allow index-only scans.
Depending on table size and data distribution, there may be faster alternatives:
Single
If you want to condense it into a single
Equivalent, except for sort order. If you want this to be fast, try a functional index:
(
SELECT DISTINCT ON (t2_id)
t1_id, t2_id
FROM t0
WHERE t2_id IS NOT NULL
ORDER BY t2_id, t1_id -- to get consistent results
)
UNION ALL
(
SELECT DISTINCT ON (t1_id)
t1_id, NULL -- cheaper
FROM t0
WHERE t2_id IS NULL
-- if you retrieve more columns, add ORDER BY, too
)-
As ypercube mentioned You need to add
ORDER BY with an unambiguous list of expressions to get deterministic results.-
You can use the constant
NULL instead of t2_id in the second leg of the query. Also relevant for below index support.-
The key to performance is indexing. Try two partial indexes matching the two parts of the query:
CREATE INDEX t0_part1_idx ON t0 (t2_id, t1_id) WHERE t2_id IS NOT NULL;
CREATE INDEX t0_part2_idx ON t0 (t1_id) WHERE t2_id IS NULL;You may or may not want to include additional columns to allow index-only scans.
Depending on table size and data distribution, there may be faster alternatives:
- Select first row in each GROUP BY group? (last chapter)
- Optimize GROUP BY query to retrieve latest record per user
Single
SELECTIf you want to condense it into a single
SELECT:SELECT DISTINCT ON (coalesce(t2_id, t1_id), t2_id)
t1_id, t2_id
FROM t0
ORDER BY coalesce(t2_id, t1_id), t2_id, t1_id;Equivalent, except for sort order. If you want this to be fast, try a functional index:
CREATE INDEX t0_func_idx ON t0 (coalesce(t2_id, t1_id), t2_id, t1_id);Code Snippets
(
SELECT DISTINCT ON (t2_id)
t1_id, t2_id
FROM t0
WHERE t2_id IS NOT NULL
ORDER BY t2_id, t1_id -- to get consistent results
)
UNION ALL
(
SELECT DISTINCT ON (t1_id)
t1_id, NULL -- cheaper
FROM t0
WHERE t2_id IS NULL
-- if you retrieve more columns, add ORDER BY, too
)CREATE INDEX t0_part1_idx ON t0 (t2_id, t1_id) WHERE t2_id IS NOT NULL;
CREATE INDEX t0_part2_idx ON t0 (t1_id) WHERE t2_id IS NULL;SELECT DISTINCT ON (coalesce(t2_id, t1_id), t2_id)
t1_id, t2_id
FROM t0
ORDER BY coalesce(t2_id, t1_id), t2_id, t1_id;CREATE INDEX t0_func_idx ON t0 (coalesce(t2_id, t1_id), t2_id, t1_id);Context
StackExchange Database Administrators Q#79980, answer score: 5
Revisions (0)
No revisions yet.