HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

mysql : inner join takes 3 minutes

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
joinmysqlminutesinnertakes

Problem

I had a similar question open on SO for postgres - now having the same issue with mysql..

I have two tables -

Table A : 1MM rows, AsOfDate, Id, BId (foreign key to table B)

Table B : 50k rows, Id, Flag, ValidFrom, ValidTo

Table A contains multiple records per day between 2011/01/01 and 2011/12/31 across 100 BId's. Table B contains multiple non overlapping (between validfrom and validto) records for 100 Bids.

The task of the join will be to return the flag that was active for the BId on the given AsOfDate.

select 
    a.AsOfDate, b.Flag 
from 
    A a inner Join B b on 
        a.BId = b.Id and b.ValidFrom = a.AsOfDate
where
    a.AsOfDate >= 20110101 and a.AsOfDate <= 20111231


This query takes over 3 minutes on a very high end server (+3Ghz) with 64Gb of memory.

+-------+-------------------------+
| Table | Create Table            
|
+-------+-------------------------+
| a     | CREATE TABLE `a` (
  `asofdate` int(4) NOT NULL,
  `bid` int(4) NOT NULL,
  KEY `asofdate_bid` (`asofdate`,`bid`),
  KEY `bid` (`bid`),
  KEY `bid_asofdate` (`bid`,`asofdate`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-------------------------+

+-------+-------------------------+
| Table | Create Table            |
+-------+-------------------------+
| b     | CREATE TABLE `b` (
  `key` int(4) NOT NULL,
  `id` int(4) NOT NULL,
  `flag` char(1) NOT NULL,
  `validfrom` int(4) NOT NULL,
  `validto` int(4) NOT NULL,
  KEY `id` (`id`),
  KEY `validfrom` (`validfrom`),
  KEY `validfrom_id` (`validfrom`,`id`),
  KEY `id_validfrom` (`id`,`validfrom`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-------------------------+


Here is the explain :

```
mysql> explain select count(1) from a a inner join b b on a.bid = b.id and b.validfrom = a.asofdate where a.asofdate >= 20120101 and a.asofdate <= 20121231;

+----+-------------+-------+------+----------------------------------------+--------------+---------+----------+-------+-----------+
| id | select_type | table |

Solution

Here is your original query

select 
    a.AsOfDate, b.Flag 
from 
    A a inner Join B b on 
        a.BId = b.Id and b.ValidFrom = a.AsOfDate
where
    a.AsOfDate >= 20110101 and a.AsOfDate <= 20111231


I would suggest refactoring your query in this instance:

select 
    a.AsOfDate, b.Flag 
from
    (
        select * from A
        WHERE AsOfDate >= 20110101
        AND AsOfDate = a.asofdate
;


That way, the A side's date range ( 20110101 - 20111231 ) gets handled first before the JOIN. An additional benefit of the refactored query is that the JOIN of A and B involves a smaller subset of A.

If you feel uncomfortable with the refactored query, here is another suggestion: switch the range-based WHERE and JOIN clauses

select 
    a.AsOfDate, b.Flag 
from 
    A a inner Join B b on 
        a.BId = b.Id and a.AsOfDate >= 20110101 and a.AsOfDate = a.AsOfDate


Give it a Try !!!

Code Snippets

select 
    a.AsOfDate, b.Flag 
from 
    A a inner Join B b on 
        a.BId = b.Id and b.ValidFrom <= a.AsOfDate and b.ValidTo >= a.AsOfDate
where
    a.AsOfDate >= 20110101 and a.AsOfDate <= 20111231
select 
    a.AsOfDate, b.Flag 
from
    (
        select * from A
        WHERE AsOfDate >= 20110101
        AND AsOfDate <= 20111231
    ) a INNER JOIN B b ON a.bid=b.id
    AND b.validfrom <= a.asofdate
    AND b.validto   >= a.asofdate
;
select 
    a.AsOfDate, b.Flag 
from 
    A a inner Join B b on 
        a.BId = b.Id and a.AsOfDate >= 20110101 and a.AsOfDate <= 20111231
where
    b.ValidFrom <= a.AsOfDate and b.ValidTo >= a.AsOfDate

Context

StackExchange Database Administrators Q#28391, answer score: 5

Revisions (0)

No revisions yet.