HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Why is it examining 2.2 billions rows for this delete query with a subquery?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thisrowswhydeleteexaminingwithquerysubquerybillionsfor

Problem

We use the community software Invision Power Suite 4.1.15. The community has 15+ million posts and we've set up the search index table it has to contain a year of posts.

We discovered the following query in our slow log:

# Query_time: 6466.229122  Lock_time: 0.000053 Rows_sent: 0 Rows_examined: 2287405946
SET timestamp=1475416358;
/*IPS\Content\Search\Mysql\_Index::prune:179*/
    DELETE FROM `ibf_core_search_index_item_map`
        WHERE index_item_id IN(
            SELECT index_item_id
                FROM `ibf_core_search_index` AS `core_search_index`
                WHERE index_date_updated < 1443873892 );


As you can see this query apparently examines 2.2 billion rows. Which is... bad...

However, a count on the table core_search_index_item_map table gives only 434926 rows. And a count on ibf_core_search_index gives 1559806 rows.

Furthermore, if I run the following query now (the subselect-query from above)

SELECT index_item_id FROM `ibf_core_search_index` AS `core_search_index`
    WHERE index_date_updated < 1443873892


I'm being returned 99 rows.

And if I run the following select query (I've replaced DELETE FROM with SELECT * FROM):

SELECT * FROM `ibf_core_search_index_item_map`
    WHERE index_item_id IN(
        SELECT index_item_id
            FROM `ibf_core_search_index` AS `core_search_index`
                WHERE index_date_updated < 1443873892 );


Then I'm being returned 301 rows.

The explain of the select query gives this: http://pastebin.com/kivhZ0An

While an explain of the delete query gives this: http://pastebin.com/0Pzvi5Qg

Even though all of the select queries separately gives way less rows than 2.28 billion rows, that's apparently what the delete query is working with. And this have happened consistently for the past few days when that delete query has been initiated, so it's not just a one-off.

This caused serious issues in our community at the time it was running. And the software IPS automatically wan

Solution

MySQL's optimizer never had many options when optimizing IN with subqueries:

WHERE column IN (SELECT ....)


In 5.6 and 5.7 versions there were several improvements in that area but seems like there are still cases when it fails to find the best plan.

I suggest you rewrite the query using JOIN:

DELETE item_map.* 
FROM ibf_core_search_index_item_map AS item_map
  JOIN ibf_core_search_index AS core_search_index
    ON item_map.index_item_id = core_search_index.index_item_id 
WHERE core_search_index.index_date_updated < 1443873892 ;


or (to lead the optimizer to materialize the sub result):

DELETE item_map.* 
FROM ibf_core_search_index_item_map AS item_map
  JOIN 
    ( SELECT index_item_id 
      FROM ibf_core_search_index 
      WHERE index_date_updated < 1443873892 
    ) AS core_search_index
  ON item_map.index_item_id = core_search_index.index_item_id ;


Regarding why your DELETE and SELECT are showing different plans: Not sure about all the optimizer internals but sometimes these additional options/improvements are not added for all statements at the same time. I.e. an improvement may be made that affects only SELECT statements and not UPDATE/DELETE ones, at version X and then in a later version Y, these are added. And the details may differ (due to locking considerations, isolation levels etc.) Modifying the table's data is different than just selecting them.

As for the 2.2 billion, that's about (1.5M / 300) * 434K.

Other things that affect efficiency:

  • The columns (index_item_id) have different types in the 2 tables. One is signed int, the other unsigned int. Change one of them so they match.



  • There is no index on the item_map (index_item_id).



  • An index on (index_date_updated, index_item_id) would help, too.

Code Snippets

WHERE column IN (SELECT ....)
DELETE item_map.* 
FROM ibf_core_search_index_item_map AS item_map
  JOIN ibf_core_search_index AS core_search_index
    ON item_map.index_item_id = core_search_index.index_item_id 
WHERE core_search_index.index_date_updated < 1443873892 ;
DELETE item_map.* 
FROM ibf_core_search_index_item_map AS item_map
  JOIN 
    ( SELECT index_item_id 
      FROM ibf_core_search_index 
      WHERE index_date_updated < 1443873892 
    ) AS core_search_index
  ON item_map.index_item_id = core_search_index.index_item_id ;

Context

StackExchange Database Administrators Q#151306, answer score: 4

Revisions (0)

No revisions yet.