HiveBrain v1.2.0
Get Started
← Back to all entries
gotchaModerate

What is the difference between `ORDER BY` and `PARTITION BY` arguments in the `OVER` clause?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
theorderpartitionwhatargumentsdifferencebetweenandclauseover

Problem

These queries below both give me exactly the same results, which I assume is because of my dataset rather than how the arguments work.

When using an OVER clause, what is the difference between ORDER BY and PARTITION BY.

On a slightly different note, why not use the term GROUP BY instead of the more complicated sounding PARTITION BY, since it seems that using partitioning in this case seems to achieve the same thing as grouping. I know you can alter these inner partitions and that those changes then reflect in the table. Is that the reason?

Query 1:

SELECT
    CP.iYear,
    AVG(AVG(CP.mUpgradeCost)) OVER(ORDER BY iYear) AS [avg]
FROM ProForma.dbo.CapitalProject CP
GROUP BY CP.iYear


Query 2:

SELECT
    CP.iYear,
    AVG(AVG(CP.mUpgradeCost)) OVER(PARTITION BY iYear) AS [avg]
FROM ProForma.dbo.CapitalProject CP
GROUP BY CP.iYear

Solution

The PARTITION BY works as a "windowed group" and the ORDER BY does the ordering within the group. However, because you're using GROUP BY CP.iYear, you're effectively reducing your window to just a single row (GROUP BY is performed before the windowed function). The average of a single row will be the value of that row, in your case AVG(CP.mUpgradeCost).

As for query 2, are you trying to create a running average or something? Not even sure what you would expect that query to return.

If you remove GROUP BY CP.iYear and change AVG(AVG(...)) to just AVG(), you'll see the difference.

Here's an example that will hopefully explain the use of PARTITION BY and/or ORDER BY:

Random table:

a     b
----- ------
X     1001
X     1002
X     1003
Y     1001
Y     1002


Now,

SELECT a, b, COUNT(*) OVER (PARTITION BY a) FROM r;


will return

a     b
----- ------ ------
X     1001   3
X     1002   3
X     1003   3
Y     1001   2
Y     1002   2


So you can see that there are 3 rows with a=X and 2 rows with a=Y. The ORDER BY clause comes into play when you want an ordered window function, like a row number or a running total.

SELECT a, b, ROW_NUMBER() OVER (PARTITION BY a ORDER BY b),
             SUM(b) OVER (PARTITION BY a ORDER BY b ROWS UNBOUNDED PRECEDING)
FROM r;


... gives you:

a     b
----- ------ ------ ------
X     1001   1      1001
X     1002   2      2003
X     1003   3      3006
Y     1001   1      1001   <- row number and running total resets here; new partition
Y     1002   2      2003

Code Snippets

a     b
----- ------
X     1001
X     1002
X     1003
Y     1001
Y     1002
SELECT a, b, COUNT(*) OVER (PARTITION BY a) FROM r;
a     b
----- ------ ------
X     1001   3
X     1002   3
X     1003   3
Y     1001   2
Y     1002   2
SELECT a, b, ROW_NUMBER() OVER (PARTITION BY a ORDER BY b),
             SUM(b) OVER (PARTITION BY a ORDER BY b ROWS UNBOUNDED PRECEDING)
FROM r;
a     b
----- ------ ------ ------
X     1001   1      1001
X     1002   2      2003
X     1003   3      3006
Y     1001   1      1001   <- row number and running total resets here; new partition
Y     1002   2      2003

Context

StackExchange Database Administrators Q#168037, answer score: 16

Revisions (0)

No revisions yet.