patternsqlMinor
Will altering order of columns in GROUP BY affect performance?
Viewed 0 times
ordergroupcolumnsaffectwillperformancealtering
Problem
I have a SQL query like this:
On a huge dataset the actual execution plan attributes most of the time to a node that corresponds to logical operation Aggregate and physical operation HashMatch.
Will changing the order in which the columns and conditions are listed in the
SELECT A, B, (CASE WHEN C=0 THEN 0 ELSE 1 END), COUNT(D)
FROM SomeTable
GROUP BY A, B, (CASE WHEN C=0 THEN 0 ELSE 1 END)On a huge dataset the actual execution plan attributes most of the time to a node that corresponds to logical operation Aggregate and physical operation HashMatch.
Will changing the order in which the columns and conditions are listed in the
GROUP BY list affect performance?Solution
Despite what MSDN documentation says, no, it doesn't matter for
You can test it here, at: SQL-Fiddle test (SQL-Server 2012)
The queries:
produce the same execution plan:
Now if you change that Group by to:
it does make a difference and it does produce different execution plans (and different result sets of course). It still uses the index though, at least in that SQL-Fiddle test. It the mismatching order case, it gets the data from the index but it has to do an intermediate sort (to calculate the rollup values).
GROUP BY queries.You can test it here, at: SQL-Fiddle test (SQL-Server 2012)
CREATE TABLE test
( id INT IDENTITY(1,1)
, a INT NOT NULL
, b INT NOT NULL
, c INT NOT NULL
, d INT NOT NULL
, PRIMARY KEY (id)
) ;
CREATE NONCLUSTERED INDEX a_b_c_include_d_index
ON test (a, b, c)
INCLUDE (d) ;
INSERT INTO test
(a, b, c, d)
VALUES
... some 50K rowsThe queries:
SELECT COUNT(*) AS num
, MIN(cnt) AS min_count
, MAX(cnt) AS max_count
, MIN(min_d) AS min_min_d
, MAX(min_d) AS max_min_d
, MIN(max_d) AS min_max_d
, MAX(max_d) AS max_max_d
FROM
( SELECT a
, b
, c
, COUNT(d) AS cnt
, MIN(d) AS min_d
, MAX(d) AS max_d
FROM test
GROUP BY a, b, c --- or with: `GROUP BY b, a, c`
) AS grp ;produce the same execution plan:
MICROSOFT SQL SERVER 2005 XML SHOWPLAN
SELECT
Compute Scalar
Cost: 0%
Stream Aggregate
(Aggregate)
Cost: 0%
Compute Scalar
Cost: 10%
Stream Aggregate
(Aggregate)
Cost: 10%
Index Scan
[a_b_c_include_d_index].[test]
Cost: 89%Now if you change that Group by to:
GROUP BY a, b, c
WITH ROLLUPit does make a difference and it does produce different execution plans (and different result sets of course). It still uses the index though, at least in that SQL-Fiddle test. It the mismatching order case, it gets the data from the index but it has to do an intermediate sort (to calculate the rollup values).
Code Snippets
CREATE TABLE test
( id INT IDENTITY(1,1)
, a INT NOT NULL
, b INT NOT NULL
, c INT NOT NULL
, d INT NOT NULL
, PRIMARY KEY (id)
) ;
CREATE NONCLUSTERED INDEX a_b_c_include_d_index
ON test (a, b, c)
INCLUDE (d) ;
INSERT INTO test
(a, b, c, d)
VALUES
... some 50K rowsSELECT COUNT(*) AS num
, MIN(cnt) AS min_count
, MAX(cnt) AS max_count
, MIN(min_d) AS min_min_d
, MAX(min_d) AS max_min_d
, MIN(max_d) AS min_max_d
, MAX(max_d) AS max_max_d
FROM
( SELECT a
, b
, c
, COUNT(d) AS cnt
, MIN(d) AS min_d
, MAX(d) AS max_d
FROM test
GROUP BY a, b, c --- or with: `GROUP BY b, a, c`
) AS grp ;MICROSOFT SQL SERVER 2005 XML SHOWPLAN
SELECT
Compute Scalar
Cost: 0%
Stream Aggregate
(Aggregate)
Cost: 0%
Compute Scalar
Cost: 10%
Stream Aggregate
(Aggregate)
Cost: 10%
Index Scan
[a_b_c_include_d_index].[test]
Cost: 89%GROUP BY a, b, c
WITH ROLLUPContext
StackExchange Database Administrators Q#17143, answer score: 7
Revisions (0)
No revisions yet.