patternsqlMinor
What is the default 'window' an aggregate function is applied to?
Viewed 0 times
thewhatfunctionapplieddefaultwindowaggregate
Problem
The SQL Server documentation on the
Determines the partitioning and ordering of a rowset before the associated window function is applied
By this I assume that aggregate functions are 'window functions'? While I understand what a window is (or at least how it works) when using the
i.e. for the SQL statement:
What is the Window that the SUM function is being applied to?
Is it the grouped table prior to the SUM function being applied? In this case is it correct to assume that all window functions are the last operations to be applied to a result set? Since
====================
In response to the answer below by Max Vernon:
If I write this query:
I get the results you refer to, with a row for each year in
I then changed the SQL to group on years so that I got a single line per year. the SQL is:
HOWEVER, when I altered the function to this:
```
SELECT
CP.iYear,
SUM(AVG(CP.mUpgradeCost)) OVER(PARTITION BY iYear) AS [avg]
OVER clause HERE, mentions that the OVER clause:Determines the partitioning and ordering of a rowset before the associated window function is applied
By this I assume that aggregate functions are 'window functions'? While I understand what a window is (or at least how it works) when using the
OVER clause, what is the default window for an aggregation function when no OVER clause is used?i.e. for the SQL statement:
SELECT
someCol,
someCol2,
AVG(someCol3) [avg]
FROM
tbl
GROUP BY
someCol,
someCol2What is the Window that the SUM function is being applied to?
Is it the grouped table prior to the SUM function being applied? In this case is it correct to assume that all window functions are the last operations to be applied to a result set? Since
FROM / WHERE / GROUP BY / HAVING / SELECT / ORDER BY must all have been worked out to produce the final window on which to apply aggregation functions?====================
In response to the answer below by Max Vernon:
If I write this query:
SELECT
CP.iYear,
AVG(CP.mUpgradeCost) OVER(PARTITION BY iYear) AS [avg]
FROM ProForma.dbo.CapitalProject CPI get the results you refer to, with a row for each year in
CapitalProject, and since there are many repeated years I get the same average for each year. I.e.+-------+------------+
| iYear | avg |
+-------+------------+
| 0 | 18636.3636 |
+-------+------------+
| 0 | 18636.3636 |
+-------+------------+
| 1 | 46741.5151 |
+-------+------------+
| 1 | 46741.5151 |
+-------+------------+I then changed the SQL to group on years so that I got a single line per year. the SQL is:
SELECT
CP.iYear,
AVG(AVG(CP.mUpgradeCost)) OVER(PARTITION BY iYear) AS [avg]
FROM ProForma.dbo.CapitalProject CP
GROUP BY CP.iYearHOWEVER, when I altered the function to this:
```
SELECT
CP.iYear,
SUM(AVG(CP.mUpgradeCost)) OVER(PARTITION BY iYear) AS [avg]
Solution
Your example code:
The "window" for the
You could rewrite the above query as:
The results would be similar, although not identical. The 2nd variant returns a result row for every row in the
You can see this by creating a test-bed:
The results:
+---------+----------+------------------+
| someCol | someCol2 | (No column name) |
+---------+----------+------------------+
| 1 | 1 | 1.000000 |
| 2 | 2 | 2.500000 |
| 4 | 1 | 4.000000 |
+---------+----------+------------------+
+---------+----------+------------------+
| someCol | someCol2 | (No column name) |
+---------+----------+------------------+
| 1 | 1 | 1.000000 |
| 2 | 2 | 2.500000 |
| 2 | 2 | 2.500000 |
| 4 | 1 | 4.000000 |
+---------+----------+------------------+
If you specify a blank
+---------+----------+------------------+------------------+------------------+------------------+
| someCol | someCol2 | (No column name) | (No column name) | (No column name) | (No column name) |
+---------+----------+------------------+------------------+------------------+------------------+
| 1 | 1 | 4 | 1 | 2.500000 | 1.000000 |
| 2 | 2 | 4 | 2 | 2.500000 | 2.500000 |
| 2 | 2 | 4 | 2 | 2.500000 | 2.500000 |
| 4 | 1 | 4 | 1 | 2.500000 | 4.000000 |
+---------+----------+------------------+------------------+------------------+------------------+
You can perform multiple aggregates on a single column however, you cannot do it in a single
Returns:
+---------+----------+------------------+
| someCol | someCol2 | (No column name) |
+---------+----------+------------------+
| 1 | 1 | 1.000000 |
| 2 | 2 | 5.000000 |
| 4 | 1 | 4.000000 |
+---------+----------+------------------+
However, in this contrived example, the value of 5 makes no sense that I can understand as the sum of an average. Perhaps your real problem makes more sense. Perhaps you really want this:
Results:
+---------+----------+--------------+
| someCol | someCol2 | avg_someCol3 |
+---------+----------+--------------+
| 1 | 1 | 1.000000 |
| 2 | 2 | 2.500000 |
| 4 | 1 | 4.000000 |
+---------+----------+--------------+
This is more easily obtained by using the
```
SELECT DISTINCT ta.someCol
, ta.someCol2
, AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.
SELECT
someCol,
someCol2,
AVG(someCol3) [avg]
FROM
tbl
GROUP BY
someCol,
someCol2The "window" for the
AVG(someCol3) column is the combination of unique values for each someCol, someCol2 tuple, by virtue of the GROUP BY clause.You could rewrite the above query as:
SELECT
someCol,
someCol2,
AVG(someCol3) OVER (PARTITION BY someCol, someCol2)
FROM
tblThe results would be similar, although not identical. The 2nd variant returns a result row for every row in the
tbl table, whereas the 1st variant returns a result row for every unique combination of someCol, someCol2. In the 2nd result set, each row would contain the average value of someCol3 for each unique combination of someCol, someCol2 by virtue of the PARTITION BY clause.You can see this by creating a test-bed:
IF OBJECT_ID('dbo.TestAgg') IS NOT NULL
DROP TABLE dbo.TestAgg;
CREATE TABLE dbo.TestAgg
(
someCol int NOT NULL
, someCol2 int NOT NULL
, someCol3 decimal(10,5) NOT NULL
);
INSERT INTO dbo.TestAgg (someCol, someCol2, someCol3)
VALUES (1, 1, 1)
, (2, 2, 2)
, (2, 2, 3)
, (4, 1, 4);
SELECT ta.someCol
, ta.someCol2
, AVG(ta.someCol3)
FROM dbo.TestAgg ta
GROUP BY ta.someCol
, ta.someCol2
ORDER BY ta.someCol
, ta.someCol2;
SELECT ta.someCol
, ta.someCol2
, AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
FROM dbo.TestAgg ta
ORDER BY ta.someCol
, ta.someCol2;The results:
+---------+----------+------------------+
| someCol | someCol2 | (No column name) |
+---------+----------+------------------+
| 1 | 1 | 1.000000 |
| 2 | 2 | 2.500000 |
| 4 | 1 | 4.000000 |
+---------+----------+------------------+
+---------+----------+------------------+
| someCol | someCol2 | (No column name) |
+---------+----------+------------------+
| 1 | 1 | 1.000000 |
| 2 | 2 | 2.500000 |
| 2 | 2 | 2.500000 |
| 4 | 1 | 4.000000 |
+---------+----------+------------------+
If you specify a blank
OVER () clause, the "window" is the entire resultset:SELECT ta.someCol
, ta.someCol2
, COUNT(ta.someCol3) OVER ()
, COUNT(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
, AVG(ta.someCol3) OVER ()
, AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
FROM dbo.TestAgg ta+---------+----------+------------------+------------------+------------------+------------------+
| someCol | someCol2 | (No column name) | (No column name) | (No column name) | (No column name) |
+---------+----------+------------------+------------------+------------------+------------------+
| 1 | 1 | 4 | 1 | 2.500000 | 1.000000 |
| 2 | 2 | 4 | 2 | 2.500000 | 2.500000 |
| 2 | 2 | 4 | 2 | 2.500000 | 2.500000 |
| 4 | 1 | 4 | 1 | 2.500000 | 4.000000 |
+---------+----------+------------------+------------------+------------------+------------------+
You can perform multiple aggregates on a single column however, you cannot do it in a single
SELECT ... statement; you must use a subquery or CTE to do so:SELECT s.someCol
, s.someCol2
, SUM(avg_someCol3)
FROM (
SELECT ta.someCol
, ta.someCol2
, avg_someCol3 = AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
FROM dbo.TestAgg ta
) s
GROUP BY s.someCol
, s.someCol2Returns:
+---------+----------+------------------+
| someCol | someCol2 | (No column name) |
+---------+----------+------------------+
| 1 | 1 | 1.000000 |
| 2 | 2 | 5.000000 |
| 4 | 1 | 4.000000 |
+---------+----------+------------------+
However, in this contrived example, the value of 5 makes no sense that I can understand as the sum of an average. Perhaps your real problem makes more sense. Perhaps you really want this:
SELECT s.someCol
, s.someCol2
, avg_someCol3
FROM (
SELECT ta.someCol
, ta.someCol2
, avg_someCol3 = AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
FROM dbo.TestAgg ta
) s
GROUP BY s.someCol
, s.someCol2
, s.avg_someCol3Results:
+---------+----------+--------------+
| someCol | someCol2 | avg_someCol3 |
+---------+----------+--------------+
| 1 | 1 | 1.000000 |
| 2 | 2 | 2.500000 |
| 4 | 1 | 4.000000 |
+---------+----------+--------------+
This is more easily obtained by using the
DISTINCT keyword, although it make me shudder to think I just suggested using it:```
SELECT DISTINCT ta.someCol
, ta.someCol2
, AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.
Code Snippets
SELECT
someCol,
someCol2,
AVG(someCol3) [avg]
FROM
tbl
GROUP BY
someCol,
someCol2SELECT
someCol,
someCol2,
AVG(someCol3) OVER (PARTITION BY someCol, someCol2)
FROM
tblIF OBJECT_ID('dbo.TestAgg') IS NOT NULL
DROP TABLE dbo.TestAgg;
CREATE TABLE dbo.TestAgg
(
someCol int NOT NULL
, someCol2 int NOT NULL
, someCol3 decimal(10,5) NOT NULL
);
INSERT INTO dbo.TestAgg (someCol, someCol2, someCol3)
VALUES (1, 1, 1)
, (2, 2, 2)
, (2, 2, 3)
, (4, 1, 4);
SELECT ta.someCol
, ta.someCol2
, AVG(ta.someCol3)
FROM dbo.TestAgg ta
GROUP BY ta.someCol
, ta.someCol2
ORDER BY ta.someCol
, ta.someCol2;
SELECT ta.someCol
, ta.someCol2
, AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
FROM dbo.TestAgg ta
ORDER BY ta.someCol
, ta.someCol2;SELECT ta.someCol
, ta.someCol2
, COUNT(ta.someCol3) OVER ()
, COUNT(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
, AVG(ta.someCol3) OVER ()
, AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
FROM dbo.TestAgg taSELECT s.someCol
, s.someCol2
, SUM(avg_someCol3)
FROM (
SELECT ta.someCol
, ta.someCol2
, avg_someCol3 = AVG(ta.someCol3) OVER (PARTITION BY ta.someCol, ta.someCol2)
FROM dbo.TestAgg ta
) s
GROUP BY s.someCol
, s.someCol2Context
StackExchange Database Administrators Q#168028, answer score: 4
Revisions (0)
No revisions yet.