HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Why changing the column in the ORDER BY section of window function "MAX() OVER()" affects the final result?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
resultwhytheordersectioncolumnaffectsfunctionwindowmax

Problem

I have a table with below structure and it's data :

create table test_table
(
Item_index   int,
Item_name    varchar(50)
)

insert into test_table (Item_index,Item_name) values (0,'A')
insert into test_table (Item_index,Item_name) values (1,'B')
insert into test_table (Item_index,Item_name) values (0,'C')
insert into test_table (Item_index,Item_name) values (1,'D')
insert into test_table (Item_index,Item_name) values (0,'E')


I want to know why changing the column in order by section of the query , changes the result? In QUERY-1 , I used item_index and in the QUERY-2 I used item_name column in the order by section. I thought that both queries must generate the same result because I used item_index in both queries for partitioning! I'm completely confused now ! why the order by column should affect the final result?

QUERY-1:

select t.*,
       max(t.Item_name)over(partition by t.item_index order by item_index) new_column
from test_table t;


RESULT:

Item_index  Item_name     new_column
----------- --------------------------
0           A                E
0           C                E
0           E                E
1           D                D
1           B                D


QUERY-2:

select t.*,
       max(t.Item_name)over(partition by t.item_index order by item_name) new_column
from test_table t;


RESULT:

Item_index  Item_name  new_column
----------- -----------------------
0           A             A
0           C             C
0           E             E
1           B             B
1           D             D


Can anybody explain how exactly these two queries are being executed and why each of them generates different result?

Thanks in advance

Solution

The explanation for the different results is given in SQL Server's documentation about window functions, the ORDER BY section:

ORDER BY

Defines the logical order of the rows within each partition of the result set. That is, it specifies the logical order in which the window function calculation is performed.

If it is specified, and a ROWS/RANGE is not specified, then default
RANGE UNBOUNDED PRECEDING AND CURRENT ROW is used as default for
window frame by the functions that can accept optional ROWS/RANGE
specification (for example min or max).

Note that MIN() and MAX() window aggregates accept an optional ROWS or RANGE specification.

When there is no such specification, they calculate the MIN and MAX over the whole partition. When there is, they calculate the MIN or MAX over the specified range. Since your two queries specify different orders/ranges, they yield different results.

If you wanted the MAX over the whole partition, then remove the ORDER BY range:

If it is not specified, the default order is ASC and window function will use all rows in partition.

Context

StackExchange Database Administrators Q#290034, answer score: 5

Revisions (0)

No revisions yet.