patternsqlMinor
Why is this SQL query with UNION significantly faster than the same query without?
Viewed 0 times
thiswhythesamewithoutwithsqlqueryunionthan
Problem
I'm trying to optimize a query, which never completes on Postgres 12.7. It takes hours, even days, make the CPU goes 100%, and never returns:
I randomly tried to rewrite it using UNION instead, and I believe it's equivalent. Basically there's two parts in the query, one for type = 1 or 3, and one for type = 2.
This query returns within 10 seconds, as opposed to never returning after several days for the other one. Any idea what's causing this huge difference?
Query plans
For the original query:
```
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1001.01..1697110.80 rows=100 width=119)
-> Gather Merge (cost=1001.01..8625312957.40 rows=508535 width=119)
Workers Planned: 2
-> Parallel Index Scan using changes_pkey on changes (cost=0.98..8625253259.82 rows=211890 width=119)
Filter: ((((type = 1) OR (type = 3)) AND ((user_id)::text = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW'::text)) OR ((type = 2) AND (SubPlan 1)))
SubPlan 1
-> Materialize (cost=0.55..18641.22 rows=143863 width=33)
-> Index Only Scan using user_ite
SELECT "id", "counter", "item_id", "item_name", "type", "updated_time"
FROM "changes"
WHERE (type = 1 OR type = 3) AND user_id = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW'
OR type = 2 AND item_id IN (SELECT item_id FROM user_items WHERE user_id = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW')
ORDER BY "counter" ASC LIMIT 100;
I randomly tried to rewrite it using UNION instead, and I believe it's equivalent. Basically there's two parts in the query, one for type = 1 or 3, and one for type = 2.
(
SELECT "id", "counter", "item_id", "item_name", "type", "updated_time"
FROM "changes"
WHERE (type = 1 OR type = 3) AND user_id = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW'
) UNION (
SELECT "id", "counter", "item_id", "item_name", "type", "updated_time"
FROM "changes"
WHERE type = 2 AND item_id IN (SELECT item_id FROM user_items WHERE user_id = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW')
) ORDER BY "counter" ASC LIMIT 100;
This query returns within 10 seconds, as opposed to never returning after several days for the other one. Any idea what's causing this huge difference?
Query plans
For the original query:
```
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1001.01..1697110.80 rows=100 width=119)
-> Gather Merge (cost=1001.01..8625312957.40 rows=508535 width=119)
Workers Planned: 2
-> Parallel Index Scan using changes_pkey on changes (cost=0.98..8625253259.82 rows=211890 width=119)
Filter: ((((type = 1) OR (type = 3)) AND ((user_id)::text = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW'::text)) OR ((type = 2) AND (SubPlan 1)))
SubPlan 1
-> Materialize (cost=0.55..18641.22 rows=143863 width=33)
-> Index Only Scan using user_ite
Solution
It's typically a good idea to split up that ugly
The first
And after adding
You didn't provide actual numbers, so judging from the high number of items per user (
In combination with this index:
For substantially different cardinalities a different
The complete query then:
Yes, that's 3x
Asides
The query plan shows
You are operating with large
Failing that, you could at least add
Your current table design is wasteful. Consider rewriting like this:
Should make the table ~ 15 MB smaller (comparing pristine tables without bloat) and everything slightly faster. See:
OR in to a UNION query. See:- Why is an OR statement slower than UNION?
The first
SELECT of the UNION query should melt down to milliseconds with this partial multicolumn index:CREATE INDEX ON changes (user_id, counter)
WHERE type IN (1, 3);And after adding
ORDER BY counter LIMIT 100. Since the outer query has the same, we never need more than 100 rows from this part:( -- now parentheses are required
SELECT id, counter, item_id, item_name, type, updated_time
FROM changes
WHERE type IN (1, 3)
AND user_id = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW'
ORDER BY counter
LIMIT 100
)You didn't provide actual numbers, so judging from the high number of items per user (
rows=146991 in the query plan) try this one as 2nd SELECT:(
SELECT id, counter, item_id, item_name, type, updated_time
FROM changes c
WHERE type = 2
AND EXISTS (
SELECT FROM user_items u
WHERE u.user_id = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW'
AND c.item_id = u.item_id
)
ORDER BY counter
LIMIT 100
);In combination with this index:
CREATE INDEX ON changes (counter, item_id) WHERE type = 2;For substantially different cardinalities a different
SELECT may be (much) better. In particular, this will backfire for users with few or no items.The complete query then:
()
UNION
()
ORDER BY counter
LIMIT 100;Yes, that's 3x
ORDER BY counter LIMIT 100 altogether.Asides
The query plan shows
(never executed) for SubPlan 1, which seems to imply that no row with type = 2 was found. Which is odd. (See jjanes' added answer for a possible explanation.)You are operating with large
varchar(32) IDs. If you really need globally unique identifiers, consider uuid instead. Much smaller and faster. Else, a plain bigint (or even integer) can easily cover your 2 million rows. Makes tables and indexes smaller and faster. Faster UNION, too. See:- UUID or BIGSERIAL Foreign Keys Postgres
Failing that, you could at least add
COLLATE "C" to your varchar(32) columns to improve UNION performance (and all sorts and related operations). Unless you run the DB with COLLATE "C" anyway, which seems unlikely. See:- https://www.postgresql.org/docs/current/collation.html
Your current table design is wasteful. Consider rewriting like this:
Table "public.changes"
Column | Type | Collation | Nullable | Default
---------------+-----------------------+-----------+----------+------------------------------------------
counter | integer | | not null | nextval('changes_counter_seq'::regclass)
type | integer | | not null |
item_type | integer | | not null |
item_id | character varying(32) | | not null |
item_name | text | | not null | ''::text
id | character varying(32) | | not null |
previous_item | text | | not null | ''::text
user_id | character varying(32) | | not null | ''::character varying
updated_time | bigint | | not null |
created_time | bigint | | not null |
Should make the table ~ 15 MB smaller (comparing pristine tables without bloat) and everything slightly faster. See:
- Calculating and saving space in PostgreSQL
Code Snippets
CREATE INDEX ON changes (user_id, counter)
WHERE type IN (1, 3);( -- now parentheses are required
SELECT id, counter, item_id, item_name, type, updated_time
FROM changes
WHERE type IN (1, 3)
AND user_id = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW'
ORDER BY counter
LIMIT 100
)(
SELECT id, counter, item_id, item_name, type, updated_time
FROM changes c
WHERE type = 2
AND EXISTS (
SELECT FROM user_items u
WHERE u.user_id = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW'
AND c.item_id = u.item_id
)
ORDER BY counter
LIMIT 100
);CREATE INDEX ON changes (counter, item_id) WHERE type = 2;(<query 1>)
UNION
(<query 2>)
ORDER BY counter
LIMIT 100;Context
StackExchange Database Administrators Q#299737, answer score: 6
Revisions (0)
No revisions yet.