patternsqlMinor
Optimize a query with two range conditions
Viewed 0 times
withrangequerytwooptimizeconditions
Problem
I have a structure similar to this one:
Here is a SQL Fiddle
So I'm trying to execute a query with two range conditions
but the optimizer uses only part of my index:
is there a way to optimize it?
CREATE TABLE `author` (
`id` int(11) unsigned NOT NULL auto_increment,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1;
CREATE TABLE `book` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`author_id` int(11) unsigned NOT NULL,
`org` int(11) unsigned NOT NULL,
`country` char(3) NOT NULL,
`publish_date` date NOT NULL,
`price` decimal(6,2) NOT NULL,
PRIMARY KEY (`id`),
KEY `author_id` (`author_id`),
KEY `publish_date` (`publish_date`),
KEY `i0` (`country`, `org`, `author_id`, `price`, `publish_date`),
KEY `i1` (`country`, `org`, `author_id`, `publish_date`, `price`)
) ENGINE=InnoDB CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1;Here is a SQL Fiddle
So I'm trying to execute a query with two range conditions
SELECT
id as author_id,
(SELECT COUNT(DISTINCT `book`.`id`)+1
FROM `book`
WHERE
`book`.`org` = 1
AND `book`.`country` = 'USA'
AND `book`.`publish_date` BETWEEN '2010-04-30' AND '2011-04-30'
AND `book`.`author_id` = `author`.`id`
AND `book`.`price` < 50
) AS `books_under_fifty`
FROM `author`
ORDER BY books_under_fifty desc;but the optimizer uses only part of my index:
const,const,db_9_6349e2.author.id from i0is there a way to optimize it?
Solution
Well, let's think out of the box. It seems that there are two sets of results -- the authors with some such books, and those without. The first case is more efficiently done via:
That would probably be best served by
If you are happy with the performance of that, then you can think out of the box in finding "the rest of the authors".
Back to the 2-ranges problem. That can sometimes be solved by Partitioning. Not knowing the distribution of the data (is one year a small subset of the total? or what about price?), I can't say which would be better:
Partitioning by price is out, since
Then make 20-50 partitions of the partition key. This would give you "partition pruning" for one 'range', but then the index needs to be tweaked for the other. Assuming you partition by
Meanwhile, ...
Don't use utf8 if you are using standard country_codes; use ascii, at least for that column.
What is
Another variant that might be better:
This variant assumes you have no 'natural' PK. And it will be more efficient because it clusters the desired rows together. I did not include
add index
I'm using Percona's 5.6.22-71.0-log .
Note that there are only 4 different
SELECT author_id,
COUNT(*)+1 AS books_under_fifty
FROM `book`
WHERE `org` = 1
AND `country` = 'USA'
AND publish_date >= '2010-05-01'
AND publish_date < '2010-05-01' + INTERVAL 1 YEAR
AND `price` < 50That would probably be best served by
INDEX(country, org, publish_date, price, author_id)If you are happy with the performance of that, then you can think out of the box in finding "the rest of the authors".
Back to the 2-ranges problem. That can sometimes be solved by Partitioning. Not knowing the distribution of the data (is one year a small subset of the total? or what about price?), I can't say which would be better:
PARTITION BY RANGE(TO_DAYS(publish_date))Partitioning by price is out, since
DECIMAL can't be used. Storing price as number of cents and using PARTITION BY RANGE(cents) would work, but clumsily.Then make 20-50 partitions of the partition key. This would give you "partition pruning" for one 'range', but then the index needs to be tweaked for the other. Assuming you partition by
publish_date:INDEX(country, org, price)
PRIMARY KEY(id, publish_date)Meanwhile, ...
Don't use utf8 if you are using standard country_codes; use ascii, at least for that column.
What is
org does it distinguish the book? Or are there some missing columns? Is there no other "unique" column(s)?Another variant that might be better:
PRIMARY KEY(country, org, price, -- for clustering
publish_date, -- because partitioning requires it
id) -- to assure uniqueness
INDEX(id)This variant assumes you have no 'natural' PK. And it will be more efficient because it clusters the desired rows together. I did not include
author_id in the PK since it is readily available in the row.add index
ADD INDEX ac(author_id, country) tricked it into using i0. Go figure. Note: I left (author_id) intact and first in the list of indexes; so it is not just the order of the indexes.I'm using Percona's 5.6.22-71.0-log .
Note that there are only 4 different
country values, evenly distributed. But there are many different author_id, making it by itself more "selective".Code Snippets
SELECT author_id,
COUNT(*)+1 AS books_under_fifty
FROM `book`
WHERE `org` = 1
AND `country` = 'USA'
AND publish_date >= '2010-05-01'
AND publish_date < '2010-05-01' + INTERVAL 1 YEAR
AND `price` < 50INDEX(country, org, publish_date, price, author_id)PARTITION BY RANGE(TO_DAYS(publish_date))INDEX(country, org, price)
PRIMARY KEY(id, publish_date)PRIMARY KEY(country, org, price, -- for clustering
publish_date, -- because partitioning requires it
id) -- to assure uniqueness
INDEX(id)Context
StackExchange Database Administrators Q#178299, answer score: 4
Revisions (0)
No revisions yet.