HiveBrain v1.2.0
Get Started
← Back to all entries
principlesqlMinor

MySQL: delete...where..in( ) vs delete..from..join, and locked tables on delete with subselect

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
tablesdeletewithwherelockedsubselectjoinmysqlandfrom

Problem

Disclaimer: please excuse my lack of knowledge about database internals. Here it goes:

We run an application (not written by us) which has a big performance problem in a periodic cleanup job in the database. The query looks like this:

delete from VARIABLE_SUBSTITUTION where BUILDRESULTSUMMARY_ID in (
       select BUILDRESULTSUMMARY_ID from BUILDRESULTSUMMARY
       where BUILDRESULTSUMMARY.BUILD_KEY = "BAM-1");


Straight forward, easy to read, and standard SQL. But unfortunately very slow.
Explaining the query shows that the existing index on VARIABLE_SUBSTITUTION.BUILDRESULTSUMMARY_ID is not used:

mysql> explain delete from VARIABLE_SUBSTITUTION where BUILDRESULTSUMMARY_ID in (
    ->        select BUILDRESULTSUMMARY_ID from BUILDRESULTSUMMARY
    ->        where BUILDRESULTSUMMARY.BUILD_KEY = "BAM-1");
| id | select_type        | table                 | type            | possible_keys                    | key     | key_len | ref  | rows    | Extra       |
+----+--------------------+-----------------------+-----------------+----------------------------------+---------+---------+------+---------+-------------+
|  1 | PRIMARY            | VARIABLE_SUBSTITUTION | ALL             | NULL                             | NULL    | NULL    | NULL | 7300039 | Using where |
|  2 | DEPENDENT SUBQUERY | BUILDRESULTSUMMARY    | unique_subquery | PRIMARY,key_number_results_index | PRIMARY | 8       | func |       1 | Using where |


This makes it very slow (120 seconds and more). In addition to that, it seems to block queries that try to insert into BUILDRESULTSUMMARY, output from show engine innodb status:

```
---TRANSACTION 68603695, ACTIVE 157 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 127964, OS thread handle 0x7facd0670700, query id 956555826 localhost 127.0.0.1 bamboosrv updating
update BUILDRESULTSUMMARY set CREATED_DATE='2015-06-18 09:22:05', UPDATED_DATE='2015-0

Solution


  • why is the query optimizer not able to use the index for deleting when the subquery version, while it is while using the join version?



Because the optimizer is/was a bit dumb in that regard. Not only for DELETE and UPDATE but for SELECT statements as well, anything like WHERE column IN (SELECT ...) was not fully optimized. The execution plan usually involved running the subquery for every row of the external table (VARIABLE_SUBSTITUTION in this case). If that table is small, everything is fine. If it's large, no hope. In even older versions, a IN subquery with an IN sub-subquery would make even the EXPLAIN to run for ages.

What you can do - if you want to keep this query - is to use the latest versions that have implemented several optimizations and test again. Latest versions meaning: MySQL 5.6 (and 5.7 when it comes out of beta) and MariaDB 5.5 / 10.0

(update) You already use 5.6 which has optimization improvements, and this one is relevant: Optimizing Subqueries with Semi-Join Transformations

I suggest adding an index on (BUILD_KEY) alone. There is a composite one but that's not very useful for this query.

  • is there any (ideally standards conform) way to trick it into using the index?



None that I can think of. In my opinion, there is not much worth in trying to use standard SQL. There are so many differences and minor quirks that each DBMS has (UPDATE and DELETE statements are good examples of such differences) that when you try to use something that works everywhere, the result is a very limited subset of SQL.

  • is there a portable way to write a delete from join? The application supports PostgreSQL, MySQL, Oracle and Microsoft SQL Server, used via jdbc and Hibernate.



Same answer as the previous question.

  • why is the delete from VARIABLE_SUBSTITUTION blocking inserts into BUILDRESULTSUMMARY, which is only used in the subselect?



Not 100% sure but I think it has to do with running the subquery multiple times and what type of locks it is taking on the table.

Context

StackExchange Database Administrators Q#104416, answer score: 7

Revisions (0)

No revisions yet.