HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Optimizing Join between a very small and a very large table

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
joinbetweensmalloptimizinglargeveryandtable

Problem

I have two tables:

CREATE TABLE `TBL_REPORT` (
  `ID_TBL_REPORT` bigint(20) NOT NULL AUTO_INCREMENT,
  `INDEX_NAME` varchar(100) NOT NULL,
  `INDEX_VALUE` varchar(100) NOT NULL,
  `FK_TBL_REPORT_PROXY` bigint(20) NOT NULL,
  `PAYLOAD` mediumtext NOT NULL,
  `OPERATION` varchar(200) DEFAULT NULL,
  `ERROR` varchar(50) NOT NULL,
  `SERVER` varchar(50) NOT NULL,
  `DATA` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`ID_TBL_REPORT`),
  KEY `TBL_INTER_TBL_INTER_PROXY_FK` (`FK_TBL_REPORT_PROXY`),
  KEY `IX_INDEX_NAME` (`INDEX_NAME`),
  KEY `IX_INDEX_VALUE` (`INDEX_VALUE`),
  KEY `IX_DATA` (`DATA`),
  CONSTRAINT `TBL_INTER_TBL_INTER_PROXY_FK` FOREIGN KEY (`FK_TBL_REPORT_PROXY`) REFERENCES `TBL_REPORT_PROXY` (`ID_TBL_REPORT_PROXY`)
) ENGINE=InnoDB AUTO_INCREMENT=4769095 DEFAULT CHARSET=latin1

CREATE TABLE `TBL_REPORT_PROXY` (
  `ID_TBL_REPORT_PROXY` bigint(20) NOT NULL AUTO_INCREMENT,
  `PROXY` varchar(100) NOT NULL,
  `SERVICE_GROUP` varchar(150) NOT NULL,
  PRIMARY KEY (`ID_TBL_REPORT_PROXY`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1


The fields INDEX_NAME, INDEX_VALUE and DATA has indexes created.

The first table represents a call to a service in a SOA Bus, and the second table represents the service attributes.
The first can contains more than 100.000.000 rows and the second about 100 rows.

My goal is to select filtering by some field, and paginating by limit n,10. The fields that can be used to filter are PROXY, INDEX_NAME, ERROR and DATA.

For example, if the user wants to see the first page filtering by PROXY, the query is:

select R.ID_TBL_REPORT, R.INDEX_NAME, R.INDEX_VALUE, R.ERROR, R.DATA, P.PROXY, P.SERVICE_GROUP 
    from TBL_REPORT R, TBL_REPORT_PROXY P
        where P.PROXY = 'MyProxy' 
        and R.FK_TBL_REPORT_PROXY = P.ID_TBL_REPORT_PROXY 
        limit 1,10;


This works well and the result is shown very fast.
However, the results need to be ordered by the las

Solution

Wrong interpretation of the error.

You had a power failure (or some other crash). You are using MyISAM (you should use InnoDB). Those caused the "Incorrect key file for table".

CHECK TABLE TBL_REPORT R;
CHECK TABLE TBL_REPORT_PROXY;


If either says to "repair", then perform REPAIR TABLE.

If that does not work, look at this value:

SHOW VARIABLES LIKE 'tmpdir';


Does that refer to a separate filesystem? Is that filesystem small? Perhaps you ran out of disk space in that filesystem (/tmp)? If so, consider not putting tmpdir in it. (Actually, I would expect a different error message if you ran out of disk space.)

Please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE. I can't tell what indexes you have. You need one starting with FK_TBL_REPORT_PROXY and/or one starting with DATA.

The optimizer wants to start with the smaller table, especially since you have a WHERE clause restricting the rows there.

I have two suggestions; either or both might help:

This version tries to force it to use your index starting with DATA, which would let it quit after 10 rows:

select  R.ID_TBL_REPORT, R.INDEX_NAME, R.INDEX_VALUE, R.ERROR,
        R.DATA, P.PROXY, P.SERVICE_GROUP
    from  TBL_REPORT R FORCE INDEX(DATA)
    JOIN  TBL_REPORT_PROXY P IGNORE INDEX(PROXY)
         ON R.FK_TBL_REPORT_PROXY = P.ID_TBL_REPORT_PROXY
    where  P.PROXY = 'MyProxy'
    order by  R.DATA desc -- ordering by the the last first
    limit  1,10;


This version switches to HAVING in an attempt to trick it into not starting with the PROXY table:

select  R.ID_TBL_REPORT, R.INDEX_NAME, R.INDEX_VALUE, R.ERROR,
        R.DATA, P.PROXY, P.SERVICE_GROUP
    from  TBL_REPORT R
    JOIN  TBL_REPORT_PROXY P
         ON R.FK_TBL_REPORT_PROXY = P.ID_TBL_REPORT_PROXY
    HAVING  P.PROXY = 'MyProxy'
    order by  R.DATA desc -- ordering by the the last first
    limit  1,10;


(I switched from comma-JOIN to the preferred explicit JOIN.)

Please run EXPLAIN EXTENDED SELECT ...; then SHOW WARNINGS; on each of them, and show us the outputs.

I have a blog on the evils of pagination using OFFSET and LIMIT. It includes a much more efficient alternative. However, you must first get past the immediate problem.

Code Snippets

CHECK TABLE TBL_REPORT R;
CHECK TABLE TBL_REPORT_PROXY;
SHOW VARIABLES LIKE 'tmpdir';
select  R.ID_TBL_REPORT, R.INDEX_NAME, R.INDEX_VALUE, R.ERROR,
        R.DATA, P.PROXY, P.SERVICE_GROUP
    from  TBL_REPORT R FORCE INDEX(DATA)
    JOIN  TBL_REPORT_PROXY P IGNORE INDEX(PROXY)
         ON R.FK_TBL_REPORT_PROXY = P.ID_TBL_REPORT_PROXY
    where  P.PROXY = 'MyProxy'
    order by  R.DATA desc -- ordering by the the last first
    limit  1,10;
select  R.ID_TBL_REPORT, R.INDEX_NAME, R.INDEX_VALUE, R.ERROR,
        R.DATA, P.PROXY, P.SERVICE_GROUP
    from  TBL_REPORT R
    JOIN  TBL_REPORT_PROXY P
         ON R.FK_TBL_REPORT_PROXY = P.ID_TBL_REPORT_PROXY
    HAVING  P.PROXY = 'MyProxy'
    order by  R.DATA desc -- ordering by the the last first
    limit  1,10;

Context

StackExchange Database Administrators Q#96825, answer score: 2

Revisions (0)

No revisions yet.