patternsqlMinor
Grouping data based on cumulative sum
Viewed 0 times
groupingcumulativebasedsumdata
Problem
I have been looking online for an answer but do not really know how to formulate correctly what I would like to achieve and whether it's possible, sorry if the question sounds dumb. I am using Postgresql.
I have price data per day.
I want to assign numbers (1, 2, 3...) to the price data based on the day and the sum of the prices. Every time when the sum > 60, the sum calculation starts again + every time when a new day is reached, the sum calculation starts again. So for example:
Row 1 [day 1, price 40] = 1. Then for row 2 [day 1, price 20] the price sum is 20 + 40 60, therefore the counting of the sum has to restart and number 2 is assigned to row 3. The result will look like:
Does anyone know if this is achievable and how? I understand how to take the
I have price data per day.
CREATE TEMP TABLE Price (id,Day, Price) AS
VALUES
(1, 1, 40),
(2, 1, 20),
(3, 1, 50),
(4, 1, 10),
(5, 1, 20),
(6, 1, 60),
(7, 2, 10),
(8, 2, 40),
(9, 2, 10),
(10,2, 20),
(11,2, 10);I want to assign numbers (1, 2, 3...) to the price data based on the day and the sum of the prices. Every time when the sum > 60, the sum calculation starts again + every time when a new day is reached, the sum calculation starts again. So for example:
Row 1 [day 1, price 40] = 1. Then for row 2 [day 1, price 20] the price sum is 20 + 40 60, therefore the counting of the sum has to restart and number 2 is assigned to row 3. The result will look like:
Does anyone know if this is achievable and how? I understand how to take the
SUM(Price) OVER (PARTITION BY Day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), but do not know how to restart the sum counting when the condition is met. Thank you in advance for the help!Solution
You want something like this,
Working from the inside out we calculate the cumulative sum with
Notice our
And you're done.
SELECT *, lag(cumsum,1,0::bigint) OVER (PARTITION BY day ORDER BY id)/60 AS grp
FROM (
SELECT
id,
day,
price,
sum(price) OVER (PARTITION BY day ORDER BY id) AS cumsum
FROM price
) AS t;Working from the inside out we calculate the cumulative sum with
sum() in a window function,SELECT
id,
day,
price,
sum(price) OVER (PARTITION BY day ORDER BY id) AS cumsum
FROM price;
id | day | price | cumsum
----+-----+-------+--------
1 | 1 | 40 | 40
2 | 1 | 20 | 60
3 | 1 | 50 | 110
4 | 1 | 10 | 120
5 | 1 | 20 | 140
6 | 1 | 60 | 200
7 | 2 | 10 | 10
8 | 2 | 40 | 50
9 | 2 | 10 | 60
10 | 2 | 20 | 80
11 | 2 | 10 | 90
(11 rows)Notice our
cumsum resets on a new day, but it persists past your 60. We need to fix that, for that we simply look at the prior row, and divide by 60. We don't want to start a new group, unless the last row exceeds a multiple of 60.SELECT *, lag(cumsum,1,0::bigint) OVER (PARTITION BY day ORDER BY id)/60 AS grp
FROM (
SELECT
id,
day,
price,
sum(price) OVER (PARTITION BY day ORDER BY id) AS cumsum
FROM price
) AS t;
id | day | price | cumsum | grp
----+-----+-------+--------+-----
1 | 1 | 40 | 40 | 0
2 | 1 | 20 | 60 | 0
3 | 1 | 50 | 110 | 1
4 | 1 | 10 | 120 | 1
5 | 1 | 20 | 140 | 2
6 | 1 | 60 | 200 | 2
7 | 2 | 10 | 10 | 0
8 | 2 | 40 | 50 | 0
9 | 2 | 10 | 60 | 0
10 | 2 | 20 | 80 | 1
11 | 2 | 10 | 90 | 1
(11 rows)And you're done.
Code Snippets
SELECT *, lag(cumsum,1,0::bigint) OVER (PARTITION BY day ORDER BY id)/60 AS grp
FROM (
SELECT
id,
day,
price,
sum(price) OVER (PARTITION BY day ORDER BY id) AS cumsum
FROM price
) AS t;SELECT
id,
day,
price,
sum(price) OVER (PARTITION BY day ORDER BY id) AS cumsum
FROM price;
id | day | price | cumsum
----+-----+-------+--------
1 | 1 | 40 | 40
2 | 1 | 20 | 60
3 | 1 | 50 | 110
4 | 1 | 10 | 120
5 | 1 | 20 | 140
6 | 1 | 60 | 200
7 | 2 | 10 | 10
8 | 2 | 40 | 50
9 | 2 | 10 | 60
10 | 2 | 20 | 80
11 | 2 | 10 | 90
(11 rows)SELECT *, lag(cumsum,1,0::bigint) OVER (PARTITION BY day ORDER BY id)/60 AS grp
FROM (
SELECT
id,
day,
price,
sum(price) OVER (PARTITION BY day ORDER BY id) AS cumsum
FROM price
) AS t;
id | day | price | cumsum | grp
----+-----+-------+--------+-----
1 | 1 | 40 | 40 | 0
2 | 1 | 20 | 60 | 0
3 | 1 | 50 | 110 | 1
4 | 1 | 10 | 120 | 1
5 | 1 | 20 | 140 | 2
6 | 1 | 60 | 200 | 2
7 | 2 | 10 | 10 | 0
8 | 2 | 40 | 50 | 0
9 | 2 | 10 | 60 | 0
10 | 2 | 20 | 80 | 1
11 | 2 | 10 | 90 | 1
(11 rows)Context
StackExchange Database Administrators Q#212190, answer score: 2
Revisions (0)
No revisions yet.