HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

What is the default 'window' an aggregate function is applied to?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thewhatfunctionapplieddefaultwindowaggregate

Problem

The SQL Server documentation on the OVER clause HERE, mentions that the OVER clause:


Determines the partitioning and ordering of a rowset before the associated window function is applied

By this I assume that aggregate functions are 'window functions'? While I understand what a window is (or at least how it works) when using the OVER clause, what is the default window for an aggregation function when no OVER clause is used?

i.e. for the SQL statement:

SELECT
  someCol,
  someCol2,
  AVG(someCol3) [avg]
FROM
  tbl
GROUP BY
  someCol,
  someCol2


What is the Window that the SUM function is being applied to?

Is it the grouped table prior to the SUM function being applied? In this case is it correct to assume that all window functions are the last operations to be applied to a result set? Since FROM / WHERE / GROUP BY / HAVING / SELECT / ORDER BY must all have been worked out to produce the final window on which to apply aggregation functions?

====================

In response to the answer below by Max Vernon:

If I write this query:

SELECT
    CP.iYear,
    AVG(CP.mUpgradeCost) OVER(PARTITION BY iYear) AS [avg]
FROM ProForma.dbo.CapitalProject CP


I get the results you refer to, with a row for each year in CapitalProject, and since there are many repeated years I get the same average for each year. I.e.

+-------+------------+
| iYear | avg        |
+-------+------------+
| 0     | 18636.3636 |
+-------+------------+
| 0     | 18636.3636 |
+-------+------------+
| 1     | 46741.5151 |
+-------+------------+
| 1     | 46741.5151 |
+-------+------------+


I then changed the SQL to group on years so that I got a single line per year. the SQL is:

SELECT
    CP.iYear,
    AVG(AVG(CP.mUpgradeCost)) OVER(PARTITION BY iYear) AS [avg]
FROM ProForma.dbo.CapitalProject CP
GROUP BY CP.iYear


HOWEVER, when I altered the function to this:

```
SELECT
CP.iYear,
SUM(AVG(CP.mUpgradeCost)) OVER(PARTITION BY iYear) AS [avg]

Solution

Your example code:

SELECT
  someCol,
  someCol2,
  AVG(someCol3) [avg]
FROM
  tbl
GROUP BY
  someCol,
  someCol2


The "window" for the AVG(someCol3) column is the combination of unique values for each someCol, someCol2 tuple, by virtue of the GROUP BY clause.

You could rewrite the above query as:

SELECT
  someCol,
  someCol2,
  AVG(someCol3) OVER (PARTITION BY someCol, someCol2)
FROM
  tbl


The results would be similar, although not identical. The 2nd variant returns a result row for every row in the tbl table, whereas the 1st variant returns a result row for every unique combination of someCol, someCol2. In the 2nd result set, each row would contain the average value of someCol3 for each unique combination of someCol, someCol2 by virtue of the PARTITION BY clause.

You can see this by creating a test-bed:

IF OBJECT_ID('dbo.TestAgg') IS NOT NULL
DROP TABLE dbo.TestAgg;
CREATE TABLE dbo.TestAgg
(
    someCol int NOT NULL
    , someCol2 int NOT NULL
    , someCol3 decimal(10,5) NOT NULL
);

INSERT INTO dbo.TestAgg (someCol, someCol2, someCol3)
VALUES (1, 1, 1)
    , (2, 2, 2)
    , (2, 2, 3)
    , (4, 1, 4);

SELECT ta.someCol
    , ta.someCol2
    , AVG(ta.someCol3)
FROM dbo.TestAgg ta
GROUP BY ta.someCol
    , ta.someCol2
ORDER BY ta.someCol
    , ta.someCol2;

SELECT ta.someCol
    , ta.someCol2
    , AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
FROM dbo.TestAgg ta
ORDER BY ta.someCol
    , ta.someCol2;


The results:

+---------+----------+------------------+
| someCol | someCol2 | (No column name) |
+---------+----------+------------------+
| 1 | 1 | 1.000000 |
| 2 | 2 | 2.500000 |
| 4 | 1 | 4.000000 |
+---------+----------+------------------+

+---------+----------+------------------+
| someCol | someCol2 | (No column name) |
+---------+----------+------------------+
| 1 | 1 | 1.000000 |
| 2 | 2 | 2.500000 |
| 2 | 2 | 2.500000 |
| 4 | 1 | 4.000000 |
+---------+----------+------------------+

If you specify a blank OVER () clause, the "window" is the entire resultset:

SELECT ta.someCol
    , ta.someCol2
    , COUNT(ta.someCol3) OVER ()
    , COUNT(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
    , AVG(ta.someCol3) OVER ()
    , AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
FROM dbo.TestAgg ta


+---------+----------+------------------+------------------+------------------+------------------+
| someCol | someCol2 | (No column name) | (No column name) | (No column name) | (No column name) |
+---------+----------+------------------+------------------+------------------+------------------+
| 1 | 1 | 4 | 1 | 2.500000 | 1.000000 |
| 2 | 2 | 4 | 2 | 2.500000 | 2.500000 |
| 2 | 2 | 4 | 2 | 2.500000 | 2.500000 |
| 4 | 1 | 4 | 1 | 2.500000 | 4.000000 |
+---------+----------+------------------+------------------+------------------+------------------+

You can perform multiple aggregates on a single column however, you cannot do it in a single SELECT ... statement; you must use a subquery or CTE to do so:

SELECT s.someCol
    , s.someCol2
    , SUM(avg_someCol3)
FROM (
    SELECT ta.someCol
        , ta.someCol2
        , avg_someCol3 = AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
    FROM dbo.TestAgg ta
    ) s
GROUP BY s.someCol
    , s.someCol2


Returns:

+---------+----------+------------------+
| someCol | someCol2 | (No column name) |
+---------+----------+------------------+
| 1 | 1 | 1.000000 |
| 2 | 2 | 5.000000 |
| 4 | 1 | 4.000000 |
+---------+----------+------------------+

However, in this contrived example, the value of 5 makes no sense that I can understand as the sum of an average. Perhaps your real problem makes more sense. Perhaps you really want this:

SELECT s.someCol
    , s.someCol2
    , avg_someCol3
FROM (
    SELECT ta.someCol
        , ta.someCol2
        , avg_someCol3 = AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
    FROM dbo.TestAgg ta
    ) s
GROUP BY s.someCol
    , s.someCol2
    , s.avg_someCol3


Results:

+---------+----------+--------------+
| someCol | someCol2 | avg_someCol3 |
+---------+----------+--------------+
| 1 | 1 | 1.000000 |
| 2 | 2 | 2.500000 |
| 4 | 1 | 4.000000 |
+---------+----------+--------------+

This is more easily obtained by using the DISTINCT keyword, although it make me shudder to think I just suggested using it:

```
SELECT DISTINCT ta.someCol
, ta.someCol2
, AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.

Code Snippets

SELECT
  someCol,
  someCol2,
  AVG(someCol3) [avg]
FROM
  tbl
GROUP BY
  someCol,
  someCol2
SELECT
  someCol,
  someCol2,
  AVG(someCol3) OVER (PARTITION BY someCol, someCol2)
FROM
  tbl
IF OBJECT_ID('dbo.TestAgg') IS NOT NULL
DROP TABLE dbo.TestAgg;
CREATE TABLE dbo.TestAgg
(
    someCol int NOT NULL
    , someCol2 int NOT NULL
    , someCol3 decimal(10,5) NOT NULL
);

INSERT INTO dbo.TestAgg (someCol, someCol2, someCol3)
VALUES (1, 1, 1)
    , (2, 2, 2)
    , (2, 2, 3)
    , (4, 1, 4);

SELECT ta.someCol
    , ta.someCol2
    , AVG(ta.someCol3)
FROM dbo.TestAgg ta
GROUP BY ta.someCol
    , ta.someCol2
ORDER BY ta.someCol
    , ta.someCol2;

SELECT ta.someCol
    , ta.someCol2
    , AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
FROM dbo.TestAgg ta
ORDER BY ta.someCol
    , ta.someCol2;
SELECT ta.someCol
    , ta.someCol2
    , COUNT(ta.someCol3) OVER ()
    , COUNT(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
    , AVG(ta.someCol3) OVER ()
    , AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
FROM dbo.TestAgg ta
SELECT s.someCol
    , s.someCol2
    , SUM(avg_someCol3)
FROM (
    SELECT ta.someCol
        , ta.someCol2
        , avg_someCol3 = AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
    FROM dbo.TestAgg ta
    ) s
GROUP BY s.someCol
    , s.someCol2

Context

StackExchange Database Administrators Q#168028, answer score: 4

Revisions (0)

No revisions yet.