HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Why is this SQL query with UNION significantly faster than the same query without?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thiswhythesamewithoutwithsqlqueryunionthan

Problem

I'm trying to optimize a query, which never completes on Postgres 12.7. It takes hours, even days, make the CPU goes 100%, and never returns:
SELECT "id", "counter", "item_id", "item_name", "type", "updated_time"
FROM "changes"
WHERE (type = 1 OR type = 3) AND user_id = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW'
OR type = 2 AND item_id IN (SELECT item_id FROM user_items WHERE user_id = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW')
ORDER BY "counter" ASC LIMIT 100;


I randomly tried to rewrite it using UNION instead, and I believe it's equivalent. Basically there's two parts in the query, one for type = 1 or 3, and one for type = 2.
(
SELECT "id", "counter", "item_id", "item_name", "type", "updated_time"
FROM "changes"
WHERE (type = 1 OR type = 3) AND user_id = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW'
) UNION (
SELECT "id", "counter", "item_id", "item_name", "type", "updated_time"
FROM "changes"
WHERE type = 2 AND item_id IN (SELECT item_id FROM user_items WHERE user_id = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW')
) ORDER BY "counter" ASC LIMIT 100;


This query returns within 10 seconds, as opposed to never returning after several days for the other one. Any idea what's causing this huge difference?
Query plans

For the original query:

```
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1001.01..1697110.80 rows=100 width=119)
-> Gather Merge (cost=1001.01..8625312957.40 rows=508535 width=119)
Workers Planned: 2
-> Parallel Index Scan using changes_pkey on changes (cost=0.98..8625253259.82 rows=211890 width=119)
Filter: ((((type = 1) OR (type = 3)) AND ((user_id)::text = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW'::text)) OR ((type = 2) AND (SubPlan 1)))
SubPlan 1
-> Materialize (cost=0.55..18641.22 rows=143863 width=33)
-> Index Only Scan using user_ite

Solution

It's typically a good idea to split up that ugly OR in to a UNION query. See:

  • Why is an OR statement slower than UNION?



The first SELECT of the UNION query should melt down to milliseconds with this partial multicolumn index:

CREATE INDEX ON changes (user_id, counter)
WHERE  type IN (1, 3);


And after adding ORDER BY counter LIMIT 100. Since the outer query has the same, we never need more than 100 rows from this part:

(  -- now parentheses are required
SELECT id, counter, item_id, item_name, type, updated_time
FROM   changes
WHERE  type IN (1, 3)
AND    user_id = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW'
ORDER  BY counter
LIMIT  100
)


You didn't provide actual numbers, so judging from the high number of items per user (rows=146991 in the query plan) try this one as 2nd SELECT:

(
SELECT id, counter, item_id, item_name, type, updated_time
FROM   changes c
WHERE  type = 2
AND    EXISTS (
   SELECT FROM user_items u
   WHERE  u.user_id = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW'   
   AND    c.item_id = u.item_id
   )
ORDER  BY counter
LIMIT  100
);


In combination with this index:

CREATE INDEX ON changes (counter, item_id) WHERE  type = 2;


For substantially different cardinalities a different SELECT may be (much) better. In particular, this will backfire for users with few or no items.

The complete query then:

()
UNION
()
ORDER  BY counter
LIMIT  100;


Yes, that's 3x ORDER BY counter LIMIT 100 altogether.
Asides

The query plan shows (never executed) for SubPlan 1, which seems to imply that no row with type = 2 was found. Which is odd. (See jjanes' added answer for a possible explanation.)

You are operating with large varchar(32) IDs. If you really need globally unique identifiers, consider uuid instead. Much smaller and faster. Else, a plain bigint (or even integer) can easily cover your 2 million rows. Makes tables and indexes smaller and faster. Faster UNION, too. See:

  • UUID or BIGSERIAL Foreign Keys Postgres



Failing that, you could at least add COLLATE "C" to your varchar(32) columns to improve UNION performance (and all sorts and related operations). Unless you run the DB with COLLATE "C" anyway, which seems unlikely. See:

  • https://www.postgresql.org/docs/current/collation.html



Your current table design is wasteful. Consider rewriting like this:
Table "public.changes"
Column | Type | Collation | Nullable | Default
---------------+-----------------------+-----------+----------+------------------------------------------
counter | integer | | not null | nextval('changes_counter_seq'::regclass)
type | integer | | not null |
item_type | integer | | not null |
item_id | character varying(32) | | not null |
item_name | text | | not null | ''::text
id | character varying(32) | | not null |
previous_item | text | | not null | ''::text
user_id | character varying(32) | | not null | ''::character varying
updated_time | bigint | | not null |
created_time | bigint | | not null |


Should make the table ~ 15 MB smaller (comparing pristine tables without bloat) and everything slightly faster. See:

  • Calculating and saving space in PostgreSQL

Code Snippets

CREATE INDEX ON changes (user_id, counter)
WHERE  type IN (1, 3);
(  -- now parentheses are required
SELECT id, counter, item_id, item_name, type, updated_time
FROM   changes
WHERE  type IN (1, 3)
AND    user_id = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW'
ORDER  BY counter
LIMIT  100
)
(
SELECT id, counter, item_id, item_name, type, updated_time
FROM   changes c
WHERE  type = 2
AND    EXISTS (
   SELECT FROM user_items u
   WHERE  u.user_id = 'kJ6GYJNPM4wdDY5dUV1b8PqDRJj6RRgW'   
   AND    c.item_id = u.item_id
   )
ORDER  BY counter
LIMIT  100
);
CREATE INDEX ON changes (counter, item_id) WHERE  type = 2;
(<query 1>)
UNION
(<query 2>)
ORDER  BY counter
LIMIT  100;

Context

StackExchange Database Administrators Q#299737, answer score: 6

Revisions (0)

No revisions yet.