HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

Nested CTEs Returning Incorrect Results in Certain Situations

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
incorrectnestedreturningresultscertainctessituations

Problem

OK, first let me say I got this issue from:

  • Seeking Tsql wrong join explantion in CTE [closed]



I was trying to assist to figure out the issue, but got stumped when trying to debug the code one step at a time. I know the issue is due to the nested CTEs (because during debug if you dump each step aka cteX into temp tables the correct results are achieved) but not knowing how they work "under the hood" I cannot explain it in a sensible fashion outside of "it dont work yo." I suspect that it has something to do with how the compiler is trying to evaluate them all at the same time during run time but without more context I cant say for sure.

My question is merely about trying to understand how they work under the hood and how it relates to this situation. Now that I'm involved, I just want to understand the issue so I can speak to it in the future and learn something fun in the mean time.

Whoever answers here can also cross post on SO and answer there as well.

Code Set up:

```
declare @t1 TABLE (ID varchar(max),Action varchar(max), DateTime datetime );
INSERT INTO @t1
Select *
from
(
VALUES
('w2337','Open','2020-11-06 12:28:10.000'),
('w2337','Hold','2021-06-14 14:50:59.000'),
('w2337','Open','2021-06-14 14:51:26.000'),
('w2337','Hold','2021-06-15 14:50:59.000'),
('w2337','Open','2021-06-17 14:51:26.000'),
('w2337','Open','2021-06-18 14:51:26.000')

) t (ID, Action, DateTime);

with cte1 as (
select [ID],[Action],[DateTime]
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as [RegIndex]
,DENSE_RANK () OVER (ORDER BY ID) as [Index by ID]
,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [DateTime]) as [Index by DateTimeID]
,CASE when [Action]='Hold' then ROW_NUMBER() OVER (PARTITION BY ID,Action ORDER BY DateTime) end as [TimesHeld]
FROM @t1
)
,cte2 as (
select *, MAX([TimesHeld]) OVER (PARTITION BY ID ORDER BY RegIndex ROWS UNBOUNDED PRECEDING) as [FD] from cte1
)
,cte3 as(
select *, CASE when [Action]='Open' then ROW_NUMBER() OVER (PARTITION BY ID,Action ORDER BY DateTim

Solution

The problem isn't incorrect results, but rather non-deterministic ordering of the ROW_NUMBER window function.

I'm going to reference Itzik Ben-Gan's great article, definitely check it out in full when you have the time: Row numbers with nondeterministic order

The way RegIndex is calculated in cte1 is explicitly non-deterministic:

ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as [RegIndex]


Ordering by a subquery that produces a constant value lets the optimizer know that order does not matter here:

...with this solution, SQL Server’s optimizer recognizes that there’s no ordering relevance, and therefore doesn’t impose an unnecessary sort or limit the storage engine’s choices to ones that must guarantee order

Following from that, FD in cte2 is based on RegIndex, and is thus non-deterministic as well:

MAX([TimesHeld]) OVER (PARTITION BY ID ORDER BY RegIndex ROWS UNBOUNDED PRECEDING) as [FD]


Finally, the query in cte3 is filtered by the value of FD:

where FD is not null


The end result is that a different set of rows can end up in cte3 depending on how the query gets optimized. And the number and data types of the columns included in the SELECT list can absolutely influence how the query is optimized, thus resulting in different results here.

Code Snippets

ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as [RegIndex]
MAX([TimesHeld]) OVER (PARTITION BY ID ORDER BY RegIndex ROWS UNBOUNDED PRECEDING) as [FD]
where FD is not null

Context

StackExchange Database Administrators Q#294319, answer score: 25

Revisions (0)

No revisions yet.