gotchasqlMinor
Why does batch mode window aggregate yield arithmetic overflow?
Viewed 0 times
whyoverflowmodebatchyielddoeswindowaggregatearithmetic
Problem
The following query performs a windowed
Full script
See this file for a fully contained reproduction script.
Query plan
Here is an annotated estimated query plan (full XML on Paste the Plan).
Similar queries that execute successfully
If any of the following modifications are made, the error does not occur:
For example, this query executes successfully:
SUM over a columnstore table with 1500 total rows, each of which has the value 0 or 1, and it overflows the INT data type. Why is this happening?SELECT a, p, s, v, m, n,
SUM(CASE WHEN n IS NULL THEN 0 ELSE 1 END)
OVER (PARTITION BY s, v, a ORDER BY p) AS lastNonNullPartition
FROM (
SELECT a, p, s, v, m, n,
RANK() OVER (PARTITION BY v, s, a, p ORDER BY m) AS rank
FROM #t /* A columnstore table with 1,500 rows */
) x
WHERE x.rank = 1
--Msg 8115, Level 16, State 2, Line 1521
--Arithmetic overflow error converting expression to data type int.Full script
See this file for a fully contained reproduction script.
Query plan
Here is an annotated estimated query plan (full XML on Paste the Plan).
Similar queries that execute successfully
If any of the following modifications are made, the error does not occur:
- Use trace flag
8649to prefer a parallel plan regardless of the cost threshold for parallelism
- Use trace flag
9453to disable batch mode
- Use the
COUNTaggregation function instead of theSUMfunction
- Remove the
WHERE x.rank = 1predicate
For example, this query executes successfully:
SELECT a, p, s, v, m, n,
SUM(CASE WHEN n IS NULL THEN 0 ELSE 1 END)
OVER (PARTITION BY s, v, a ORDER BY p) AS lastNonNullPartition
FROM (
SELECT a, p, s, v, m, n,
RANK() OVER (PARTITION BY v, s, a, p ORDER BY m) AS rank
FROM #t /* A columnstore table with 1,500 rows */
) x
WHERE x.rank = 1
OPTION (QUERYTRACEON 9453/* Disable batch mode */)Solution
Multiple commenters have been able to reproduce this issue. We initially thought that SQL Server 2017 CU10 resolved the issue, but then found that the error can be reproduced in all versions of SQL Server we tried, including CU10. However, some commenters observed an element of chance where the same script did not always trigger the error.
Because there is no logical way that computing a sum across a set of non-negative numbers whose maximum possible sum is 1,500 can overflow a 32 bit integer, we believe this is a bug in the batch mode window aggregate operator. Being a new operator in SQL Server 2016, it's reasonable to assume that there might still be some edge cases to iron out.
Here is the bug report we filed with Microsoft.
The response was:
This is fixed in SQL Server 2019 CTP 2.1 and will also be fixed soon in Azure SQL Database.
Because there is no logical way that computing a sum across a set of non-negative numbers whose maximum possible sum is 1,500 can overflow a 32 bit integer, we believe this is a bug in the batch mode window aggregate operator. Being a new operator in SQL Server 2016, it's reasonable to assume that there might still be some edge cases to iron out.
Here is the bug report we filed with Microsoft.
The response was:
This is fixed in SQL Server 2019 CTP 2.1 and will also be fixed soon in Azure SQL Database.
Context
StackExchange Database Administrators Q#218634, answer score: 7
Revisions (0)
No revisions yet.