patternsqlMinor
Optimizing large MySQL SELECT WHERE IN clauses
Viewed 0 times
wheremysqllargeoptimizingselectclauses
Problem
How can the following SELECT WHERE IN case be optimized?
I have a table with over 100 million rows with only 3 columns. The primary key (col1) is a 127 varchar. I am performing a SELECT col1 WHERE col1 IN (...) where the IN clause is 5,000 strings. I am just trying to see which of the 5,000 strings are in the DB as the primary key.
With a dedicated server and InnoDB table, the query takes anywhere from 3 to 10 seconds which is unacceptable. I don't believe 100 million+ rows should be too difficult for MySQL to do a SELECT from, even when selecting 5k rows, but perhaps I am wrong?
What could be done to optimize this? I have read a little about FULLTEXT keys - would those be better since the key is a 127 varchar? Or would some type of JOIN or UNION speed up this query over a large IN clause?
Any help would be appreciated! Thanks!
----EDIT----
```
SHOW ENGINE INNODB STATUS;
| InnoDB | |
=====================================
2014-07-14 10:59:19 2bf5cf25700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 5 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 6664 srv_active, 0 srv_shutdown, 142740 srv_idle
srv_master_thread log flush and writes: 149372
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 417120
OS WAIT ARRAY INFO: signal count 449454
Mutex spin waits 323558, rounds 2089912, OS waits 48403
RW-shared spins 49101, rounds 462555, OS waits 12976
RW-excl spins 406820, rounds 11261153, OS waits 350839
Spin rounds per wait: 6.46 mutex, 9.42 RW-shared, 27.68 RW-excl
------------
TRANSACTIONS
------------
Trx id counter 21503
Purge done for trx's n:o < 21472 undo n:o < 0 state: running but idle
History list length 641
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 15895, OS thread handle 0x2bf5cf25700, query id 399305 localhost root init
SHOW ENGINE INNODB STATUS
--------
FILE I/O
--------
I/O thread 0 st
I have a table with over 100 million rows with only 3 columns. The primary key (col1) is a 127 varchar. I am performing a SELECT col1 WHERE col1 IN (...) where the IN clause is 5,000 strings. I am just trying to see which of the 5,000 strings are in the DB as the primary key.
With a dedicated server and InnoDB table, the query takes anywhere from 3 to 10 seconds which is unacceptable. I don't believe 100 million+ rows should be too difficult for MySQL to do a SELECT from, even when selecting 5k rows, but perhaps I am wrong?
What could be done to optimize this? I have read a little about FULLTEXT keys - would those be better since the key is a 127 varchar? Or would some type of JOIN or UNION speed up this query over a large IN clause?
Any help would be appreciated! Thanks!
----EDIT----
```
SHOW ENGINE INNODB STATUS;
| InnoDB | |
=====================================
2014-07-14 10:59:19 2bf5cf25700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 5 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 6664 srv_active, 0 srv_shutdown, 142740 srv_idle
srv_master_thread log flush and writes: 149372
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 417120
OS WAIT ARRAY INFO: signal count 449454
Mutex spin waits 323558, rounds 2089912, OS waits 48403
RW-shared spins 49101, rounds 462555, OS waits 12976
RW-excl spins 406820, rounds 11261153, OS waits 350839
Spin rounds per wait: 6.46 mutex, 9.42 RW-shared, 27.68 RW-excl
------------
TRANSACTIONS
------------
Trx id counter 21503
Purge done for trx's n:o < 21472 undo n:o < 0 state: running but idle
History list length 641
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 15895, OS thread handle 0x2bf5cf25700, query id 399305 localhost root init
SHOW ENGINE INNODB STATUS
--------
FILE I/O
--------
I/O thread 0 st
Solution
I have had success in general replacing lengthy IN operations with a join against a temporary table. This makes sense because RDBMS are optimized to perform JOIN's as efficiently as possible, and the handling of lengthy IN lists will most likely be done by repeating the query for every value in the IN list. The performance issue you see could easily be explained by the simple fact of the query being run 5,000 times.
Context
StackExchange Database Administrators Q#71349, answer score: 2
Revisions (0)
No revisions yet.