HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Will altering order of columns in GROUP BY affect performance?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
ordergroupcolumnsaffectwillperformancealtering

Problem

I have a SQL query like this:

SELECT A, B, (CASE WHEN C=0 THEN 0 ELSE 1 END), COUNT(D)
FROM SomeTable
GROUP BY A, B, (CASE WHEN C=0 THEN 0 ELSE 1 END)


On a huge dataset the actual execution plan attributes most of the time to a node that corresponds to logical operation Aggregate and physical operation HashMatch.

Will changing the order in which the columns and conditions are listed in the GROUP BY list affect performance?

Solution

Despite what MSDN documentation says, no, it doesn't matter for GROUP BY queries.

You can test it here, at: SQL-Fiddle test (SQL-Server 2012)

CREATE TABLE test
( id INT IDENTITY(1,1)
, a  INT NOT NULL
, b  INT NOT NULL
, c  INT NOT NULL
, d  INT NOT NULL
, PRIMARY KEY (id)
) ;

CREATE NONCLUSTERED INDEX a_b_c_include_d_index
    ON test (a, b, c)
    INCLUDE (d) ;

INSERT INTO test
(a, b, c, d)
VALUES
 ... some 50K rows


The queries:

SELECT COUNT(*) AS num
     , MIN(cnt) AS min_count
     , MAX(cnt) AS max_count
     , MIN(min_d) AS min_min_d
     , MAX(min_d) AS max_min_d
     , MIN(max_d) AS min_max_d
     , MAX(max_d) AS max_max_d
FROM
  ( SELECT a
         , b
         , c
         , COUNT(d) AS cnt
         , MIN(d) AS min_d
         , MAX(d) AS max_d
    FROM test 
    GROUP BY a, b, c             --- or with:  `GROUP BY b, a, c`
  ) AS grp ;


produce the same execution plan:

MICROSOFT SQL SERVER 2005 XML SHOWPLAN
SELECT
Compute Scalar
Cost: 0%
Stream Aggregate
(Aggregate)
Cost: 0%
Compute Scalar
Cost: 10%
Stream Aggregate
(Aggregate)
Cost: 10%
Index Scan
[a_b_c_include_d_index].[test]
Cost: 89%


Now if you change that Group by to:

GROUP BY a, b, c
      WITH ROLLUP


it does make a difference and it does produce different execution plans (and different result sets of course). It still uses the index though, at least in that SQL-Fiddle test. It the mismatching order case, it gets the data from the index but it has to do an intermediate sort (to calculate the rollup values).

Code Snippets

CREATE TABLE test
( id INT IDENTITY(1,1)
, a  INT NOT NULL
, b  INT NOT NULL
, c  INT NOT NULL
, d  INT NOT NULL
, PRIMARY KEY (id)
) ;

CREATE NONCLUSTERED INDEX a_b_c_include_d_index
    ON test (a, b, c)
    INCLUDE (d) ;

INSERT INTO test
(a, b, c, d)
VALUES
 ... some 50K rows
SELECT COUNT(*) AS num
     , MIN(cnt) AS min_count
     , MAX(cnt) AS max_count
     , MIN(min_d) AS min_min_d
     , MAX(min_d) AS max_min_d
     , MIN(max_d) AS min_max_d
     , MAX(max_d) AS max_max_d
FROM
  ( SELECT a
         , b
         , c
         , COUNT(d) AS cnt
         , MIN(d) AS min_d
         , MAX(d) AS max_d
    FROM test 
    GROUP BY a, b, c             --- or with:  `GROUP BY b, a, c`
  ) AS grp ;
MICROSOFT SQL SERVER 2005 XML SHOWPLAN
SELECT
Compute Scalar
Cost: 0%
Stream Aggregate
(Aggregate)
Cost: 0%
Compute Scalar
Cost: 10%
Stream Aggregate
(Aggregate)
Cost: 10%
Index Scan
[a_b_c_include_d_index].[test]
Cost: 89%
GROUP BY a, b, c
      WITH ROLLUP

Context

StackExchange Database Administrators Q#17143, answer score: 7

Revisions (0)

No revisions yet.