gotchasqlModerate
GROUPING SETS returning unexpected results with calculated expression
Viewed 0 times
expressiongroupingwithreturningunexpectedresultssetscalculated
Problem
Here we have two similar queries using
where the
it's plan is here: first query plan
and
the corresponding plan is here: second query plan
Both the queries first calculate some expression for aggregation,
As the result we have
Can someone explain why the first query makes the calculation 2 times (one in aggregation and one more time in the final
grouping setswhere the
SELECT clause contains some expressions calculated in aggregation:SELECT RN10, RN10 / 10, COUNT(*) FROM
(
SELECT RN, RN/10 AS RN10, RN/100 AS RN100 FROM
(
SELECT RN = -1 + ROW_NUMBER() OVER (ORDER BY 1/0)
FROM master..spt_values
) A
) B
GROUP BY GROUPING SETS ((RN10), (RN10 / 10), ())
ORDER BY 1, 2it's plan is here: first query plan
and
SELECT RN10, SUBSTRING(RN,3,99), COUNT(*) FROM
(
SELECT RN, SUBSTRING(RN,2,99) AS RN10 FROM
(
SELECT RN = CAST(-1 + ROW_NUMBER() OVER (ORDER BY 1/0) AS VARCHAR(99))
FROM master..spt_values
) A
) B
GROUP BY GROUPING SETS ((RN10), (SUBSTRING(RN,3,99)), ())
ORDER BY 1, 2the corresponding plan is here: second query plan
Both the queries first calculate some expression for aggregation,
RN10 / 10 in the first case and SUBSTRING(RN,3,99) in the second, then the same expression is used in the SELECT clause but as the first plan shows it's re-calculated in the first query and it's not in the second.As the result we have
NULLs in the first result set that is quite unexpectedly:Can someone explain why the first query makes the calculation 2 times (one in aggregation and one more time in the final
select) while the second makes it one time only?Solution
I'm going to use a simpler example where it is clear to see what the expected results are.
Query 1
Query 1 Results
Query 2
Query 2 Results
Despite the
Query 1 and 2 should return the same results. The problem is that SQL Server decides to treat it like the following SQL
This just looks like a bug to me (trace flag 8605 shows that the damage is already done in the initial query tree representation). BUG REPORT.
Query 3
Query 3 Results
Query3 does not meet the problematic pattern of grouping on a column and an expression referencing that column. It wouldn't even be possible for the same issue to occur here anyway because the grouping sets part is equivalent to
This does not pass out the entire
For the same reason you don't see the issue with
@i-one reasons in the comments that it is likely
a bug in normalization (algebrization). It has logic that finds a
match for non-aggregated columns and expressions in the
within members of
write for example
without having to explicitly add the calculated expression as below
Or another example would be
In this case the
CREATE TABLE Queen
(
FirstName VARCHAR(7),
Surname VARCHAR(7)
);
INSERT INTO Queen
(FirstName, Surname)
VALUES
('Brian', 'May'),
('Freddie', 'Mercury'),
('John', 'Deacon'),
('Roger', 'Taylor')
;Query 1
SELECT Surname,
NULL AS SurnameInitial,
COUNT(*) AS Count
FROM Queen
GROUP BY Surname
UNION ALL
SELECT NULL AS Surname,
LEFT(Surname,1) AS SurnameInitial,
COUNT(*) AS Count
FROM Queen
GROUP BY LEFT(Surname,1)Query 1 Results
+---------+----------------+-------+
| Surname | SurnameInitial | Count |
+---------+----------------+-------+
| Deacon | NULL | 1 |
| May | NULL | 1 |
| Mercury | NULL | 1 |
| Taylor | NULL | 1 |
| NULL | D | 1 |
| NULL | M | 2 |
| NULL | T | 1 |
+---------+----------------+-------+Query 2
SELECT Surname,
LEFT(Surname,1) AS SurnameInitial,
COUNT(*) AS Count
FROM Queen
GROUP BY GROUPING SETS ( ( Surname ), (LEFT(Surname,1)) )
ORDER BY SurnameInitial, SurnameQuery 2 Results
Despite the
ORDER BY SurnameInitial and the fact that NULL sorts first in SQL Server the rows with SurnameInitial as NULL are ordered last.+---------+----------------+-------+
| Surname | SurnameInitial | Count |
+---------+----------------+-------+
| Deacon | D | 1 |
| May | M | 1 |
| Mercury | M | 1 |
| Taylor | T | 1 |
| NULL | NULL | 1 |
| NULL | NULL | 2 |
| NULL | NULL | 1 |
+---------+----------------+-------+Query 1 and 2 should return the same results. The problem is that SQL Server decides to treat it like the following SQL
WITH GrpSets AS
(
SELECT Surname,
COUNT(*) AS Count
FROM Queen
GROUP BY Surname
UNION ALL
SELECT NULL AS Surname,
COUNT(*) AS Count
FROM Queen
GROUP BY LEFT(Surname,1)
)
SELECT Surname,
LEFT(Surname,1) AS SurnameInitial,
Count
FROM GrpSetsThis just looks like a bug to me (trace flag 8605 shows that the damage is already done in the initial query tree representation). BUG REPORT.
Query 3
SELECT Surname,
LEFT(FirstName,1) AS FirstNameInitial,
COUNT(*) AS Count
FROM Queen
GROUP BY GROUPING SETS ( ( Surname ), (LEFT(FirstName,1)) )Query 3 Results
+---------+------------------+-------+
| Surname | FirstNameInitial | Count |
+---------+------------------+-------+
| NULL | B | 1 |
| NULL | F | 1 |
| NULL | J | 1 |
| NULL | R | 1 |
| Deacon | NULL | 1 |
| May | NULL | 1 |
| Mercury | NULL | 1 |
| Taylor | NULL | 1 |
+---------+------------------+-------+Query3 does not meet the problematic pattern of grouping on a column and an expression referencing that column. It wouldn't even be possible for the same issue to occur here anyway because the grouping sets part is equivalent to
SELECT Surname,
NULL AS FirstNameInitial,
COUNT(*) AS Count
FROM Queen
GROUP BY Surname
UNION ALL
SELECT NULL AS Surname,
LEFT(FirstName,1) AS FirstNameInitial,
COUNT(*) AS Count
FROM Queen
GROUP BY LEFT(FirstName,1)This does not pass out the entire
FirstName column upstream (or even have a guaranteed unique FirstName column that could be passed out) so it isn't possible for the LEFT(FirstName,1) expression to be calculated on top of that.For the same reason you don't see the issue with
(RN10), (SUBSTRING(RN,3,99)).@i-one reasons in the comments that it is likely
a bug in normalization (algebrization). It has logic that finds a
match for non-aggregated columns and expressions in the
SELECT listwithin members of
GROUP BY. The same logic seemingly allows us towrite for example
SELECT Surname, LEFT(Surname, 1), COUNT(*)
FROM Queen
GROUP BY Surnamewithout having to explicitly add the calculated expression as below
GROUP BY Surname, LEFT(Surname, 1)Or another example would be
SELECT Surname,
LEFT(Surname,1) AS SurnameInitial,
LEFT(Surname,2) AS SurnamePrefix,
COUNT(*) AS Count
FROM Queen
GROUP BY GROUPING SETS ( ( Surname ), (LEFT(Surname,1)) )In this case the
LEFT(Surname,2) is allowed and the only way of computing it would be to do it in the manner that is problematic for the LEFT(Surname,1) case.Code Snippets
CREATE TABLE Queen
(
FirstName VARCHAR(7),
Surname VARCHAR(7)
);
INSERT INTO Queen
(FirstName, Surname)
VALUES
('Brian', 'May'),
('Freddie', 'Mercury'),
('John', 'Deacon'),
('Roger', 'Taylor')
;SELECT Surname,
NULL AS SurnameInitial,
COUNT(*) AS Count
FROM Queen
GROUP BY Surname
UNION ALL
SELECT NULL AS Surname,
LEFT(Surname,1) AS SurnameInitial,
COUNT(*) AS Count
FROM Queen
GROUP BY LEFT(Surname,1)+---------+----------------+-------+
| Surname | SurnameInitial | Count |
+---------+----------------+-------+
| Deacon | NULL | 1 |
| May | NULL | 1 |
| Mercury | NULL | 1 |
| Taylor | NULL | 1 |
| NULL | D | 1 |
| NULL | M | 2 |
| NULL | T | 1 |
+---------+----------------+-------+SELECT Surname,
LEFT(Surname,1) AS SurnameInitial,
COUNT(*) AS Count
FROM Queen
GROUP BY GROUPING SETS ( ( Surname ), (LEFT(Surname,1)) )
ORDER BY SurnameInitial, Surname+---------+----------------+-------+
| Surname | SurnameInitial | Count |
+---------+----------------+-------+
| Deacon | D | 1 |
| May | M | 1 |
| Mercury | M | 1 |
| Taylor | T | 1 |
| NULL | NULL | 1 |
| NULL | NULL | 2 |
| NULL | NULL | 1 |
+---------+----------------+-------+Context
StackExchange Database Administrators Q#267156, answer score: 12
Revisions (0)
No revisions yet.