HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Mysql query works in 200ms with 33594 parameters in SELECT IN (xx,xxx,xx) but runs hours with 33595 or more parameters, why?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
why33594withhoursquerybutmorexxx33595200ms

Problem

We had a small malfunction which generated unexpected amount of data for a device. Our code ends up creating a select query along the lines of
SELEXT * from XXX WHERE yyy IN (xxx,xxx,xxx....,xxxx)
That should fetch data of last 5 days which normally is not a problem at all as the WHERE field is indexed.

We suddenly started having large amount of slow queries and bogging the server down. The queries were running at around 5000 seconds before we aborted them. On figuring out what was happening it turns out that if we have a query with maximum of 33594 parameters within the IN () clause, everything works fine and query finishes in 200ms. If we add one parameter the select seems to be running forever. We haven't let one to run indefinetly until it completes so I'm not sure if it will complete or not but they seem to last at least hours...

The data within the select is about 361.04KB. We're running on Google cloud SQL with MYSQL MYSQL_8_0_31

Any ideas on what might be causing this behaviour?

Solution

I'm more curious about why this happens to understand what's the limit...

The why is because it's an overly complicated predicate. The IN clause is syntactical sugar for a bunch of OR clauses. Too many OR clauses or too many values in an IN clause results in an overly complicated predicate that the database engine is unable to parse and generate an efficient query plan for. Instead, the query plan being generated is now likely scanning the entire table in an inefficient manner.

There is no fixed hard-coded limit when this happens, and it will vary from query to query.

...to better understand the system and thus better be able to understand what workaround/fix is the best.

~33,000 values in an IN clause is an unreasonable amount. Heck, even 100 values in my mind is unreasonable, and in my opinion such a query design is an anti-pattern.

There's a few main ways to re-write the query that can fix the issue:

-
Break the query up into multiple copies of the query with much smaller IN clauses and UNION them back together. This is probably also unrealistic to do with so many values in your IN clause though.

-
Put the values in some sort of table, such as a temporary table, and use a JOIN to it from the table you're trying to filter down. This is typically more relationally efficient.

Context

StackExchange Database Administrators Q#326381, answer score: 5

Revisions (0)

No revisions yet.