patternsqlModerate
Is order by necessary after a group by clause?
Viewed 0 times
afternecessaryordergroupclause
Problem
Suppose I have a simple query
My understanding is group by clause seems to already have a sorting functionality. DB engine would sort col1 then col2 if adjacent rows of col1 happen to have same values. As least that is my experience with sql server.
If that is the case, order by clause here isn't really necessary?
SELECT col1, col2, COUNT(col3)
FROM tbl_name
WHERE condition
GROUP BY col1, col2
ORDER BY col1 ;My understanding is group by clause seems to already have a sorting functionality. DB engine would sort col1 then col2 if adjacent rows of col1 happen to have same values. As least that is my experience with sql server.
If that is the case, order by clause here isn't really necessary?
Solution
My understanding is group by clause seems to already have a sorting
functionality.
There are two basic approaches that SQL Server can use. A stream aggregate requires the data to be sorted by the group by keys. This can be either supplied by an index or might need an explicit sort. A stream aggregate is order preserving in that the rows output from that operator are in the same order as the input. This does not imply any guarantee about the eventual output from the query as a whole however. You only get that if you add an
For a
The other basic approach is that of a hash aggregate. In which the grouping keys are hashed. This is not at all order preserving and will likely output rows in seemingly random orders.
Additionally for parallel plans there might be hybrid approaches, e.g. Each thread could have a local stream aggregate with the thread results then aggregated at global level with a hash aggregate.
Adding
functionality.
There are two basic approaches that SQL Server can use. A stream aggregate requires the data to be sorted by the group by keys. This can be either supplied by an index or might need an explicit sort. A stream aggregate is order preserving in that the rows output from that operator are in the same order as the input. This does not imply any guarantee about the eventual output from the query as a whole however. You only get that if you add an
ORDER BY.For a
GROUP BY col1, col2 the input stream can be ordered by either col1, col2 or col2, col1 to be acceptable for the stream aggregate in this case. The addition of an index might change the decision of the optimiser as to which one to use.The other basic approach is that of a hash aggregate. In which the grouping keys are hashed. This is not at all order preserving and will likely output rows in seemingly random orders.
Additionally for parallel plans there might be hybrid approaches, e.g. Each thread could have a local stream aggregate with the thread results then aggregated at global level with a hash aggregate.
Adding
ORDER BY means that SQL Server will ensure that the rows are delivered in the desired order. If you are currently observing an output order of col1, col2 then likely you are getting a stream aggregate ordered by those two columns. The addition of an explicit order by won't change the execution plan to add any additional sort as SQL Server will recognize that the output of this aggregate is already in the desired order.Context
StackExchange Database Administrators Q#142872, answer score: 16
Revisions (0)
No revisions yet.