gotchasqlMinor
Unexpected extremely long query time (~5 minutes using nested WHEN-INs)
Viewed 0 times
extremelyquerylongtimenestedunexpectedusingwhenminutesins
Problem
For the first time, I am running up against a horrifically long execution time of a MySQL query (~5 minutes).
The data in the database is highly (and not arbitrarily) normalized. It very efficient at organizing and shuffling data to be displayed in a lot of very helpful ways for different purposes, except this one particular query is throwing a wrench into it.
I can't understand the reason for it. However, some background information that may shed some light onto otherwise arbitrarily convoluted queries.
The company has divided the world up into many many teams (macroregions). Everybody belongs to one or two teams, based on their expertise.
-
For example, there are many diverse teams. A few examples are
-
The
-
Each team has a given set of regions, which are also not unique to that particular team. For example, the
-
Each country belongs to one or more regions.
Given all that, it is necessary to show each individual what other people on their teams are doing, as well as people who are not on their team, but have overlapping regions.
Query 1 accomplishes this perfectly, and very quickly less than .09 seconds.
```
SELECT report_name
FROM reports
WHERE region I
The data in the database is highly (and not arbitrarily) normalized. It very efficient at organizing and shuffling data to be displayed in a lot of very helpful ways for different purposes, except this one particular query is throwing a wrench into it.
I can't understand the reason for it. However, some background information that may shed some light onto otherwise arbitrarily convoluted queries.
The company has divided the world up into many many teams (macroregions). Everybody belongs to one or two teams, based on their expertise.
-
For example, there are many diverse teams. A few examples are
Spanish, Sahara, Iberia, Portuguese, Jungle teams. Each of the teams has considerable overlap with other teams, but is in some senses independent.-
The
Arabic team works quite closely with the Sahara team, by virtue of the fact that the database tells them they have to work together on certain assignments, because of overlapping geographic locations. The Spanish and the Portuguese team also work closely together, they both work with the Americas and Europe teams and the Portuguese team also works with the Africa team, as does the Arabic team.-
Each team has a given set of regions, which are also not unique to that particular team. For example, the
Mediterranean region belongs to around 12 teams, and when an event occurs there, they all work on it together.-
Each country belongs to one or more regions.
Turkey belongs to Central Asia, Europe and even Mediterranean, and a few others.Given all that, it is necessary to show each individual what other people on their teams are doing, as well as people who are not on their team, but have overlapping regions.
Query 1 accomplishes this perfectly, and very quickly less than .09 seconds.
```
SELECT report_name
FROM reports
WHERE region I
Solution
Let's start with your original query
You can gather keys only (in stages), then join the keys with the
If you are not comfortable with the USING clause here is my new propsed query without using the USING clause:
You will need some indexes to support the subqueries
The first 3 indexes are called covering indexes. They are called such because the subqueries calls for only those exact columns. Therefore, no need to read from the table. The data is only fetched from the index.
You should also study how to refactor queries to gather keys, perform WHERE clauses early, and perform JOINs last. Here is great YouTube Video on how to do this: http://youtu.be/ZVisY-fEoMw (Based on this book : Refactoring SQL Applications)
Give it a Try !!!
SELECT *
FROM reports
WHERE report_name IN (
SELECT report_name
FROM reports
WHERE region IN (
SELECT distinct region
FROM macroregions
WHERE macroregion IN (
SELECT distinct macroregion
FROM users
WHERE callsign = '$thisuser'
)
)
)You can gather keys only (in stages), then join the keys with the
reports table. Here is my new proposed querySELECT reports.* FROM
(
SELECT rpts.report_name FROM
(
SELECT DISTINCT regions.region FROM
(SELECT DISTINCT macroregion
FROM users WHERE callsign = '$thisuser') users
INNER JOIN
(SELECT DISTINCT macroregion,region FROM regions) regions
USING (macroregion)
) regionkeys
INNER JOIN
(SELECT region,report_name FROM reports) rpts
USING (region)
) reportnamekeys
INNER JOIN reports
USING (report_name);If you are not comfortable with the USING clause here is my new propsed query without using the USING clause:
SELECT reports.* FROM
(
SELECT rpts.report_name FROM
(
SELECT DISTINCT regions.region FROM
(SELECT DISTINCT macroregion
FROM users WHERE callsign = '$thisuser') users
INNER JOIN
(SELECT DISTINCT macroregion,region FROM regions) regions
ON users.macroregion = regions.macroregion
) regionkeys
INNER JOIN
(SELECT region,report_name FROM reports) rpts
ON regionkeys.region = rpts.region
) reportnamekeys
INNER JOIN reports
ON reportkeys.report_name = reports.report_name;You will need some indexes to support the subqueries
ALTER TABLE users ADD INDEX callsign_macroregion_ndx (callsign,macroregion);
ALTER TABLE regions ADD INDEX macroregion_region_ndx (macroregion,region);
ALTER TABLE reports ADD INDEX region_report_name_ndx (region,report_name);
ALTER TABLE reports ADD INDEX report_name_ndx (report_name);The first 3 indexes are called covering indexes. They are called such because the subqueries calls for only those exact columns. Therefore, no need to read from the table. The data is only fetched from the index.
- http://peter-zaitsev.livejournal.com/6949.html
- http://ronaldbradford.com/blog/tag/covering-index/
- http://www.mysqlperformanceblog.com/2006/11/23/covering-index-and-prefix-indexes/
You should also study how to refactor queries to gather keys, perform WHERE clauses early, and perform JOINs last. Here is great YouTube Video on how to do this: http://youtu.be/ZVisY-fEoMw (Based on this book : Refactoring SQL Applications)
Give it a Try !!!
Code Snippets
SELECT *
FROM reports
WHERE report_name IN (
SELECT report_name
FROM reports
WHERE region IN (
SELECT distinct region
FROM macroregions
WHERE macroregion IN (
SELECT distinct macroregion
FROM users
WHERE callsign = '$thisuser'
)
)
)SELECT reports.* FROM
(
SELECT rpts.report_name FROM
(
SELECT DISTINCT regions.region FROM
(SELECT DISTINCT macroregion
FROM users WHERE callsign = '$thisuser') users
INNER JOIN
(SELECT DISTINCT macroregion,region FROM regions) regions
USING (macroregion)
) regionkeys
INNER JOIN
(SELECT region,report_name FROM reports) rpts
USING (region)
) reportnamekeys
INNER JOIN reports
USING (report_name);SELECT reports.* FROM
(
SELECT rpts.report_name FROM
(
SELECT DISTINCT regions.region FROM
(SELECT DISTINCT macroregion
FROM users WHERE callsign = '$thisuser') users
INNER JOIN
(SELECT DISTINCT macroregion,region FROM regions) regions
ON users.macroregion = regions.macroregion
) regionkeys
INNER JOIN
(SELECT region,report_name FROM reports) rpts
ON regionkeys.region = rpts.region
) reportnamekeys
INNER JOIN reports
ON reportkeys.report_name = reports.report_name;ALTER TABLE users ADD INDEX callsign_macroregion_ndx (callsign,macroregion);
ALTER TABLE regions ADD INDEX macroregion_region_ndx (macroregion,region);
ALTER TABLE reports ADD INDEX region_report_name_ndx (region,report_name);
ALTER TABLE reports ADD INDEX report_name_ndx (report_name);Context
StackExchange Database Administrators Q#12606, answer score: 3
Revisions (0)
No revisions yet.