HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

MySQL -Query with Union doesn't use index

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
withqueryunionmysqldoesnindexuse

Problem

The following query runs instantly:

SELECT COUNT(*) FROM indicators.fileso
WHERE sha256 IN
(
    SELECT a.sha256
    FROM temp.ananifilesinfo a
    JOIN indicators.filesi fi ON fi.sha256 = a.sha256
    WHERE ((a.VTAmount > 2 AND fi.VTAVAmount <2)
        OR (a.state2 = 2 AND fi.state2 !=2))
)


But this query, takes forever:

SELECT COUNT(*) FROM indicators.fileso
WHERE sha256 IN(
      SELECT a.sha256
      FROM temp.ananifilesinfo a
      JOIN indicators.filesi fi ON fi.sha256 = a.sha256
      WHERE (a.VTAmount > 2 AND fi.VTAVAmount <2)
          UNION
      SELECT a.sha256
      FROM temp.ananifilesinfo a
      JOIN indicators.filesi fi ON fi.sha256 = a.sha256
      WHERE (a.state2 = 2 AND fi.state2 !=2)
 )


  • If you look at the 2 queries, they are performing the exact same logic.



  • In both queries the IN sub-query runs instantly!



Here is the explain of Query 1:

Here is the explain of Query 2:

Tables creation:

``
CREATE TABLE
filesi (
id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
sha256 BINARY(32) NOT NULL,
size BIGINT(20) NOT NULL DEFAULT '-1' COMMENT 'meire',
productId INT(11) NOT NULL DEFAULT '-1' COMMENT 'onIdle - done (sock product)',
compsInOrg INT(11) NOT NULL DEFAULT '-1' COMMENT 'onLongIdle - done',
cloudUniqueness TINYINT(4) NOT NULL DEFAULT '-1' COMMENT 'Cloud',
certificateStatusEnum INT(11) NOT NULL DEFAULT '-1' COMMENT 'static',
certificateRootEnum INT(11) NOT NULL DEFAULT '-1' COMMENT 'static',
certificateRootThumbPrint BINARY(20) NULL DEFAULT NULL COMMENT 'todo: remove nullable',
certificateThumbPrint BINARY(20) NULL DEFAULT NULL,
datein TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
lastSeen TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
metaCompanyEnum INT(11) NOT NULL DEFAULT '-1' COMMENT 'onIdle - done',
metaProductEnum INT(11) NOT NULL DEFAULT '-1' COMMENT 'onIdle -done',
lastRA` TIMESTAMP NOT NULL DEFAULT

Solution

Since MySQL 5.6, MySQL may automatically convert IN-subqueries into a JOIN query. This is called semi-join transformation. By converting the subquery to a join, the MySQL optimizer may be able to process the tables in a different order than for traditional subquery execution. For your query, the amount of data that need to be accessed, will be much less if the tables of the subquery is processed first.

However, as described in the manual, semi-join transformation will not be done if subquery contains UNION.

In your case, it seems straight-forward to avoid the union so that semi-join transformation can be done. AFAICT, this query should be equivalent to yours:

SELECT COUNT(*) FROM indicators.fileso
WHERE sha256 IN(
      SELECT a.sha256
      FROM temp.ananifilesinfo a
      JOIN indicators.filesi fi ON fi.sha256 = a.sha256
      WHERE (a.VTAmount > 2 AND fi.VTAVAmount <2)
         OR (a.state2 = 2 AND fi.state2 !=2)
)

Code Snippets

SELECT COUNT(*) FROM indicators.fileso
WHERE sha256 IN(
      SELECT a.sha256
      FROM temp.ananifilesinfo a
      JOIN indicators.filesi fi ON fi.sha256 = a.sha256
      WHERE (a.VTAmount > 2 AND fi.VTAVAmount <2)
         OR (a.state2 = 2 AND fi.state2 !=2)
)

Context

StackExchange Database Administrators Q#133972, answer score: 3

Revisions (0)

No revisions yet.