HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

MySQL subquery slows down drastically, but they work fine independently

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fineindependentlybutmysqlsubquerydownslowsworktheydrastically

Problem

Query 1:

select distinct email from mybigtable where account_id=345


takes 0.1s

Query 2:

Select count(*) as total from mybigtable where account_id=123 and email IN ()


takes 0.2s

Query 3:

Select count(*) as total from mybigtable where account_id=123 and email IN (select distinct email from mybigtable where account_id=345)


takes 22 minutes and 90% its in the "preparing" state. Why does this take so much time.

Table is innodb with 3.2mil rows on MySQL 5.0

Solution

In Query 3, you are basically executing a subquery for every row of mybigtable against itself.

To avoid this, you need to make two major changes:

MAJOR CHANGE #1 : Refactor the Query

Here is your original query

Select count(*) as total from mybigtable
where account_id=123 and email IN
(select distinct email from mybigtable where account_id=345)


You could try

select count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    INNER JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
) A;


or maybe the count per email

select email,count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    INNER JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
) A group by email;


MAJOR CHANGE #2 : Proper Indexing

I think you have this already since Query 1 and Query 2 run fast. Make sure you have a compound index on (account_id,email). Do SHOW CREATE TABLE mybigtable\G and make sure you have one. If you don't have it or if you are not sure, then create the index anyway:

ALTER TABLE mybigtable ADD INDEX account_id_email_ndx (account_id,email);


UPDATE 2012-03-07 13:26 EST

If you want to do a NOT IN(), change the INNER JOIN to a LEFT JOIN and check for the right side being NULL, like this:

select count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    LEFT JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
    WHERE tbl345.email IS NULL
) A;


UPDATE 2012-03-07 14:13 EST

Please read these two links on doing JOINs

  • http://en.wikipedia.org/wiki/Sql_join



  • http://dev.mysql.com/doc/refman/5.5/en/join.html



Here is a great YouTube Video where I learned to refactor queries and the book it was based on

  • http://youtu.be/ZVisY-fEoMw



  • http://www.amazon.com/Refactoring-SQL-Applications-Stephane-Faroult/dp/B006TR0ZEO

Code Snippets

Select count(*) as total from mybigtable
where account_id=123 and email IN
(select distinct email from mybigtable where account_id=345)
select count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    INNER JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
) A;
select email,count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    INNER JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
) A group by email;
ALTER TABLE mybigtable ADD INDEX account_id_email_ndx (account_id,email);
select count(*) EmailCount from
(
    select tbl123.email from
    (select email from mybigtable where account_id=123) tbl123
    LEFT JOIN
    (select distinct email from mybigtable where account_id=345) tbl345
    using (email)
    WHERE tbl345.email IS NULL
) A;

Context

StackExchange Database Administrators Q#14565, answer score: 11

Revisions (0)

No revisions yet.