HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlModerate

Strange query results for DELETE with ROW_NUMBER

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
deletewithquerystrangeforrow_numberresults

Problem

In the following DELETE statement I try to delete all rows except the first one ordered by some criteria. (The actual query makes more sense, this is just a repro. All the sys.objects stuff is just for generating test data.)

Note the filter r <> 1. Yet, the OUTPUT clause outputs deleted rows with r = 1. How can this be?

USE tempdb
SET XACT_ABORT ON

BEGIN TRAN

    SELECT *
    INTO #o
    FROM sys.objects

    SELECT TOP 2 name FROM #o ORDER BY object_id --debug output

    DELETE k
    OUTPUT Deleted.name, Deleted.r
    FROM (
        SELECT k.*, ROW_NUMBER() OVER (ORDER BY object_id) r
        FROM #o k
    ) k
    WHERE r <> 1 --OUTPUT returns rows with (r = 1)

    SELECT TOP 2 name FROM #o ORDER BY object_id --debug output

ROLLBACK


Query results:

(The 3rd result set is complete - only one row.)

Note, that all rows except the first one were deleted. The numbering order of column r does not seem to match what was requested. And there is a row with r = 1.

This is SQL Server 2014 CU3 with trace flag 4199 enabled.

Solution

This is expected behaviour at the moment


the function gets evaluated on the DELETE stream.

So it actually behaves like this (pseudo code)

DELETE k
OUTPUT Deleted.name, 
       ROW_NUMBER() OVER (ORDER BY Deleted.object_id) as r
FROM (
    SELECT k.*, ROW_NUMBER() OVER (ORDER BY object_id) r
    FROM #o k
) k
WHERE r <> 1 --OUTPUT returns rows with (r = 1)


Although this is the currently defined expected behaviour it isn't really reasonable and they say


Long term, we need to actually fix the behavior of OUTPUT clause to
match that of the ANSI SQL standard which will result in change of
results. So we will look at the correct semantics for a future version
of SQL Server since there might be apps that rely on the current
behavior.

I haven't tested on SQL Server 2014 but on 2012 the plan looks as the below.

After the delete operator (to the left) the column values from the deleted rows are sorted back into object_id order and the row_number re-applied.

It looks like the same is happening in your case from the results. (temp tables have a negative id and are sorted first before sysrscols which has a low positive object_id of 3).

As well as the dubious semantics of the result the second sort by object_id doesn't seem strictly necessary in this plan as it looks likely that they will already be sorted in that order in any event.

Regarding workarounds for this specific case changing the output clause to OUTPUT Deleted.name, 1 + Deleted.r AS r would work.

For more complicated WHERE clauses I think you'd need a pass to calculate the row_number and then a join. e.g.

ALTER TABLE #o
  ADD CONSTRAINT PK PRIMARY KEY (object_id);

WITH k AS
(
 SELECT *, 
        ROW_NUMBER() OVER (ORDER BY object_id) r
FROM #o
)
MERGE k AS k1
using k AS k2
ON k1.object_id = k2.object_id
WHEN matched AND k2.r <> 1 THEN
  DELETE
OUTPUT Deleted.name,
       k2.r;


Results

Code Snippets

DELETE k
OUTPUT Deleted.name, 
       ROW_NUMBER() OVER (ORDER BY Deleted.object_id) as r
FROM (
    SELECT k.*, ROW_NUMBER() OVER (ORDER BY object_id) r
    FROM #o k
) k
WHERE r <> 1 --OUTPUT returns rows with (r = 1)
ALTER TABLE #o
  ADD CONSTRAINT PK PRIMARY KEY (object_id);

WITH k AS
(
 SELECT *, 
        ROW_NUMBER() OVER (ORDER BY object_id) r
FROM #o
)
MERGE k AS k1
using k AS k2
ON k1.object_id = k2.object_id
WHEN matched AND k2.r <> 1 THEN
  DELETE
OUTPUT Deleted.name,
       k2.r;

Context

StackExchange Database Administrators Q#80558, answer score: 12

Revisions (0)

No revisions yet.