patternMinor
Different execution plans depending on columns selected from CTE
Viewed 0 times
columnsctedifferentexecutionselecteddependingfromplans
Problem
I have the following table
I am running the following CTE.
Now the question. When I run
SQL does an index seek on the nonclustered index with a key lookup on the clustered and takes 3 to 4 seconds to complete.
However, if I run the below SQL does a clustered index scan and completes in about 15 seconds.
To give some idea of distribution of values:
Can anyone provide any insight as to why SQL would pick a different execution plan for the two queries?
Execution Plans
SELECT by column name
https://www.brentozar.com/pastetheplan/?id=HyJZeZlC
SELECT *
https://www.brentozar.com/pastetheplan/?id=r1BQgZl0
CREATE TABLEA(
ID NUMERIC(19,0) IDENTITY(1,1),
COLUMNA varchar(256) NOT NULL,
COLUMNB varchar(256) NOT NULL,
COLUMNC varchar(256) NOT NULL,
COLUMND varchar(max) NULL,
COLUMNE bigint NOT NULL,
COLUMNF numberic(19,0),
CONSTRAINT PK_ID PRIMARY KEY CLUSTERED
(
COLUMNA ASC
)
) ON PRIMARY
CREATE NONCLUSTERED INDEX IDX_COLUMNE_COLUMNF ON TABLEA(COLUMNE, COLUMNF)I am running the following CTE.
WITH CTE AS(
SELECT ROW_NUMBER() OVER(PARTITION BY COLUMNE ORDER BY COLUMNE) AS rownum,
COLUMNA,
COLUMNB,
COLUMNC,
COLUMND,
COLUMNE,
COLUMNF
WHERE COLUMNF = 1 AND COLUMNE >= 1472738400000 AND COLUMNE <= 1475244000000
)Now the question. When I run
SELECT TOP 30000 * FROM CTESQL does an index seek on the nonclustered index with a key lookup on the clustered and takes 3 to 4 seconds to complete.
However, if I run the below SQL does a clustered index scan and completes in about 15 seconds.
SELECT TOP 30000 COLUMNA, COLUMNB, COLUMNC, COLUMND, COLUMNE, COLUMNF FROM CTETo give some idea of distribution of values:
- The table contains approx. ~13.7 million rows.
- COLUMNF always has a value of 1.
- Excluding the TOP 30000 the query returns 474296 rows, of which there are 92683 have a rownum value of 2 or higher.
Can anyone provide any insight as to why SQL would pick a different execution plan for the two queries?
Execution Plans
SELECT by column name
https://www.brentozar.com/pastetheplan/?id=HyJZeZlC
SELECT *
https://www.brentozar.com/pastetheplan/?id=r1BQgZl0
Solution
The plan without row number is below.
This is assigned a cost of
You have a
The table has 13,283,300 rows. A full clustered index scan is costed at
Applying the same scaling of
NB: You say that only
When you select
You have an index on
However it does not cover the query and lookups are needed to return the missing columns. The plan estimates that there will be 30,000 such lookups. There may in fact be more as the predicate on
If the row numbering plan was to use a clustered index scan it would need to be a full scan followed by a sort of all rows matching the predicate.
You say that the plan with lookups is in fact 5 times faster than the clustered index scan. Likely a much greater proportion of the clustered index needed to be read to find 30,000 matching rows than was assumed in the costings. You are on SQL Server 2014 SP1 CU5. On SQL Server 2014 SP2 the actual execution plan now has a new attribute Actual Rows Read which would tell you how many rows it did actually read. On previous versions you can use
This is assigned a cost of
44.866.You have a
TOP without ORDER BY so SQL Server just needs to scan the clustered index and as soon as it finds the first 30,000 rows matching the predicate it can stop.The table has 13,283,300 rows. A full clustered index scan is costed at
730.467 + 14.6118 = 745.0788 but this gets scaled down to 43.9392 because of the TOP. Applying the same scaling of
5.9% to the number of rows in the table this would imply that SQL Server estimates that it will only have to scan 783,350 rows before it finds 30,000 matching the WHERE and can stop scanning. NB: You say that only
474,296 rows match this predicate in the whole table but 508,747 are estimated to. That means that on average one in every 26.1 (13283300/508747) rows is assumed to match the filter. So it is estimated that 30,000 * 26.1 rows ( = 783K) will be read.When you select
* that means that the rownum column must be calculated. the plan for this is below. It is costed at 69.1185You have an index on
COLUMNE that can be seeked into. This satisfies the range predicate on COLUMNE >= 1472738400000 AND COLUMNE <= 1475244000000 and also supplies the required ordering for your row numbering.However it does not cover the query and lookups are needed to return the missing columns. The plan estimates that there will be 30,000 such lookups. There may in fact be more as the predicate on
COLUMNF = 1 may mean some rows are discarded after being looked up (though not in this case as you say COLUMNF always has a value of 1).If the row numbering plan was to use a clustered index scan it would need to be a full scan followed by a sort of all rows matching the predicate.
69.1185 is considerably cheaper than the 745.0788 + sort cost so the plan with lookups is chosen.You say that the plan with lookups is in fact 5 times faster than the clustered index scan. Likely a much greater proportion of the clustered index needed to be read to find 30,000 matching rows than was assumed in the costings. You are on SQL Server 2014 SP1 CU5. On SQL Server 2014 SP2 the actual execution plan now has a new attribute Actual Rows Read which would tell you how many rows it did actually read. On previous versions you can use
OPTION (QUERYTRACEON 9130) to see the same information.Context
StackExchange Database Administrators Q#151329, answer score: 3
Revisions (0)
No revisions yet.