gotchaMinor
Why does SQL Server run a subquery for each row of the table it's qualifying?
Viewed 0 times
whytheeachsqlsubqueryqualifyingfordoesserverrow
Problem
This query runs in ~21 seconds (execution plan):
When the subquery is replaced with a variable, it runs in <1 second (execution plan):
Judging from the execution plan, the "select max..." sub-select is run for each of the millions of rows in "SubqueryTest a:, which is why it takes so long.
My question: Since the sub-select is scalar, deterministic and not correlated, why doesn't the query optimizer do what I did in my second example and run the subquery once, store the result, then use it for the main query? I'm sure there's just a hole in my understanding of SQL Server, but I'd really like help filling it - a couple hours with google haven't helped.
The table is just over 1gb with almost 28 million records:
select
a.month
, count(*)
from SubqueryTest a
where a.year = (select max(b.year) from SubqueryTest b)
group by a.monthWhen the subquery is replaced with a variable, it runs in <1 second (execution plan):
declare @year float
select @year = max(b.year) from SubqueryTest b
select
month
, count(*)
from SubqueryTest where year = @year group by monthJudging from the execution plan, the "select max..." sub-select is run for each of the millions of rows in "SubqueryTest a:, which is why it takes so long.
My question: Since the sub-select is scalar, deterministic and not correlated, why doesn't the query optimizer do what I did in my second example and run the subquery once, store the result, then use it for the main query? I'm sure there's just a hole in my understanding of SQL Server, but I'd really like help filling it - a couple hours with google haven't helped.
The table is just over 1gb with almost 28 million records:
CREATE TABLE SubqueryTest(
[pk_id] [int] IDENTITY(1,1) NOT NULL
, [Year] [float] NULL
, [Month] [float] NULL PRIMARY KEY CLUSTERED ([pk_id] ASC))
CREATE NONCLUSTERED INDEX idxSubqueryTest ON SubqueryTest ([Year] ASC)Solution
The slow plan isn't calculating the
In fact it never explicitly calculates it at all.
It gives a plan similar to
Slow Plan (Estimated Row Counts)
You have a non covering index on
Unfortunately it doesn't seem to differentiate between
In this case it makes a huge difference. (estimated 2 key lookup vs actual 4,424,803) so you get an inappropriate plan.
Slow Plan (Actual Row Counts)
You could consider adding
A non covering index on such a non selective column is really pretty useless for the vast majority of queries. The index is totally ignored by the "fast" plan which ends up doing a parallel scan on the whole table and evaluating the predicate on all 27,445,400 rows (in preference to performing the huge number of lookups).
MAX for each row in the outer query.In fact it never explicitly calculates it at all.
It gives a plan similar to
WITH CTE
AS (SELECT TOP(1) WITH TIES *
FROM SubqueryTest
WHERE year IS NOT NULL
ORDER BY year desc)
SELECT month,
count(*)
FROM CTE
GROUP BY monthSlow Plan (Estimated Row Counts)
You have a non covering index on
year asc so it scans that backwards to get the rows in the first year (shows as a seek because of the implicit IS NOT NULL predicate).Unfortunately it doesn't seem to differentiate between
TOP 1 and TOP 1 WITH TIES when estimating row counts.In this case it makes a huge difference. (estimated 2 key lookup vs actual 4,424,803) so you get an inappropriate plan.
Slow Plan (Actual Row Counts)
You could consider adding
month into the index on year either as a key or included column to make the index covering. The benefit of adding it as a secondary key column would be that it could then feed into a stream aggregate without an additional sort (though for only 12 distinct values a hash aggregate would be fine anyway).A non covering index on such a non selective column is really pretty useless for the vast majority of queries. The index is totally ignored by the "fast" plan which ends up doing a parallel scan on the whole table and evaluating the predicate on all 27,445,400 rows (in preference to performing the huge number of lookups).
Code Snippets
WITH CTE
AS (SELECT TOP(1) WITH TIES *
FROM SubqueryTest
WHERE year IS NOT NULL
ORDER BY year desc)
SELECT month,
count(*)
FROM CTE
GROUP BY monthContext
StackExchange Database Administrators Q#57334, answer score: 6
Revisions (0)
No revisions yet.