patternsqlModerate
When can SARGable predicates be pushed into a CTE or derived table?
Viewed 0 times
predicatescanpushedintoctesargablewhenderivedtable
Problem
Sandbag
While working on Top Quality Blog Posts®, I came across some optimizer behavior I found really infuriating interesting. I don't immediately have an explanation, at least not one I'm happy with, so I'm putting it here in case someone smart shows up.
If you want to follow along, you can grab the 2013 version of the Stack Overflow data dump here. I'm using the Comments table, with one additional index on it.
Query One
When I query the table like so, I get an odd query plan.
The SARGable predicate on Score isn't pushed inside the CTE. It's in a filter operator much later in the plan.
Which I find odd, since the
Query Two
If I change the query, it does get pushed.
The query plan changes, too, and runs much faster, with no spill to disk. They both produce the same results, with the predicate at the nonclustered index scan.
Query Three
This is the equivalent of writing the query like so:
Query Four
Using a derived table gets the same "bad" query plan as the initial CTE query
Things get even weirder when...
I change the query to order the data ascending, and the filter to `
While working on Top Quality Blog Posts®, I came across some optimizer behavior I found really infuriating interesting. I don't immediately have an explanation, at least not one I'm happy with, so I'm putting it here in case someone smart shows up.
If you want to follow along, you can grab the 2013 version of the Stack Overflow data dump here. I'm using the Comments table, with one additional index on it.
CREATE INDEX [ix_ennui] ON [dbo].[Comments] ( [UserId], [Score] DESC );Query One
When I query the table like so, I get an odd query plan.
WITH x
AS
(
SELECT TOP 101
c.UserId, c.Text, c.Score
FROM dbo.Comments AS c
ORDER BY c.Score DESC
)
SELECT *
FROM x
WHERE x.Score >= 500;The SARGable predicate on Score isn't pushed inside the CTE. It's in a filter operator much later in the plan.
Which I find odd, since the
ORDER BY is on the same column as the filter. Query Two
If I change the query, it does get pushed.
WITH x
AS
(
SELECT c.UserId, c.Text, c.Score
FROM dbo.Comments AS c
)
SELECT TOP 101 *
FROM x
WHERE x.Score >= 500
ORDER BY x.Score DESC;The query plan changes, too, and runs much faster, with no spill to disk. They both produce the same results, with the predicate at the nonclustered index scan.
Query Three
This is the equivalent of writing the query like so:
SELECT TOP 101
c.UserId, c.Text, c.Score
FROM dbo.Comments AS c
WHERE c.Score >= 500
ORDER BY c.Score DESC;Query Four
Using a derived table gets the same "bad" query plan as the initial CTE query
SELECT *
FROM ( SELECT TOP 101
c.UserId, c.Text, c.Score
FROM dbo.Comments AS c
ORDER BY c.Score DESC ) AS x
WHERE x.Score >= 500;Things get even weirder when...
I change the query to order the data ascending, and the filter to `
- ASC
Solution
There are a few issues in play here.
Pushing predicates past
The optimizer cannot currently push a predicate past a
The work around is to perform the rewrite manually. The fundamental issue is similar to the case of pushing predicates past a window function, except there is no corresponding specialized rule like
My personal opinion is that an exploration rule like
* Generally this means the predicate should appear in the
Costing issues
The remaining execution plans in the question can be explained as cost-based choices due to the distribution of values in the
For example, the query:
...produces a plan with an apparently unpushed predicate in a Filter:
Note that the Sort is estimated to produce 101 rows. This is the effect of the row goal added by the Top. This affects the estimated cost of the Sort and the Filter enough to make it seem like this is the cheaper option. The estimated cost of this plan is 2401.39 units.
If we disable row goals with a query hint:
...the execution plan produced is:
The predicate has been pushed into the scan as a residual non-sargable predicate, and the cost of the whole plan is 2402.32 units.
Notice that the
Now the predicate is expected to be very selective, so the optimizer chooses to push the predicate and use the nonclustered index with lookups:
Again, the optimizer considered multiple alternatives and chose this as the apparently cheapest option, as usual.
Pushing predicates past
TOPThe optimizer cannot currently push a predicate past a
TOP, even in the limited cases where it would be safe to do so*. This limitation accounts for the behaviour of all the queries in the question where the predicate is in a higher scope than the TOP.The work around is to perform the rewrite manually. The fundamental issue is similar to the case of pushing predicates past a window function, except there is no corresponding specialized rule like
SelOnSeqPrj.My personal opinion is that an exploration rule like
SelOnTop remains unimplemented because people have deliberately written queries with TOP in an effort to provide a kind of 'optimization fence'.* Generally this means the predicate should appear in the
ORDER BY clause associated with the TOP, and the direction of any inequality should agree with the direction of the sorting. The transformation would also need to account for the sorting behaviour of NULLs in SQL Server. Overall, the limitations probably mean this transformation would not be generally useful enough in practice to justify the additional exploration efforts.Costing issues
The remaining execution plans in the question can be explained as cost-based choices due to the distribution of values in the
Score column (many more rows = 500), and the effect of the row goal introduced by the TOP.For example, the query:
--Written normally
SELECT TOP (101)
c.UserId,
c.[Text],
c.Score
FROM dbo.Comments AS c
WHERE
c.Score <= 500
ORDER BY
c.Score ASC;...produces a plan with an apparently unpushed predicate in a Filter:
Note that the Sort is estimated to produce 101 rows. This is the effect of the row goal added by the Top. This affects the estimated cost of the Sort and the Filter enough to make it seem like this is the cheaper option. The estimated cost of this plan is 2401.39 units.
If we disable row goals with a query hint:
--Written normally
SELECT TOP (101)
c.UserId,
c.[Text],
c.Score
FROM dbo.Comments AS c
WHERE
c.Score <= 500
ORDER BY
c.Score ASC
OPTION (USE HINT ('DISABLE_OPTIMIZER_ROWGOAL'));...the execution plan produced is:
The predicate has been pushed into the scan as a residual non-sargable predicate, and the cost of the whole plan is 2402.32 units.
Notice that the
= 500 predicate:--Written normally
SELECT TOP (101)
c.UserId,
c.[Text],
c.Score
FROM dbo.Comments AS c
WHERE
c.Score >= 500
ORDER BY
c.Score DESC;Now the predicate is expected to be very selective, so the optimizer chooses to push the predicate and use the nonclustered index with lookups:
Again, the optimizer considered multiple alternatives and chose this as the apparently cheapest option, as usual.
Code Snippets
--Written normally
SELECT TOP (101)
c.UserId,
c.[Text],
c.Score
FROM dbo.Comments AS c
WHERE
c.Score <= 500
ORDER BY
c.Score ASC;--Written normally
SELECT TOP (101)
c.UserId,
c.[Text],
c.Score
FROM dbo.Comments AS c
WHERE
c.Score <= 500
ORDER BY
c.Score ASC
OPTION (USE HINT ('DISABLE_OPTIMIZER_ROWGOAL'));--Written normally
SELECT TOP (101)
c.UserId,
c.[Text],
c.Score
FROM dbo.Comments AS c
WHERE
c.Score >= 500
ORDER BY
c.Score DESC;Context
StackExchange Database Administrators Q#226185, answer score: 13
Revisions (0)
No revisions yet.