HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Grouping data based on cumulative sum

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
groupingcumulativebasedsumdata

Problem

I have been looking online for an answer but do not really know how to formulate correctly what I would like to achieve and whether it's possible, sorry if the question sounds dumb. I am using Postgresql.

I have price data per day.

CREATE TEMP TABLE Price (id,Day, Price) AS 
VALUES
  (1, 1, 40),
  (2, 1, 20),
  (3, 1, 50),
  (4, 1, 10),
  (5, 1, 20),
  (6, 1, 60),
  (7, 2, 10),
  (8, 2, 40),
  (9, 2, 10),
  (10,2, 20),
  (11,2, 10);


I want to assign numbers (1, 2, 3...) to the price data based on the day and the sum of the prices. Every time when the sum > 60, the sum calculation starts again + every time when a new day is reached, the sum calculation starts again. So for example:

Row 1 [day 1, price 40] = 1. Then for row 2 [day 1, price 20] the price sum is 20 + 40 60, therefore the counting of the sum has to restart and number 2 is assigned to row 3. The result will look like:

Does anyone know if this is achievable and how? I understand how to take the SUM(Price) OVER (PARTITION BY Day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), but do not know how to restart the sum counting when the condition is met. Thank you in advance for the help!

Solution

You want something like this,

SELECT *, lag(cumsum,1,0::bigint) OVER (PARTITION BY day ORDER BY id)/60 AS grp
FROM (
  SELECT
    id,
    day,
    price,
    sum(price) OVER (PARTITION BY day ORDER BY id) AS cumsum
  FROM price
) AS t;


Working from the inside out we calculate the cumulative sum with sum() in a window function,

SELECT
  id,
  day,
  price,
  sum(price) OVER (PARTITION BY day ORDER BY id) AS cumsum
FROM price;

 id | day | price | cumsum 
----+-----+-------+--------
  1 |   1 |    40 |     40
  2 |   1 |    20 |     60
  3 |   1 |    50 |    110
  4 |   1 |    10 |    120
  5 |   1 |    20 |    140
  6 |   1 |    60 |    200
  7 |   2 |    10 |     10
  8 |   2 |    40 |     50
  9 |   2 |    10 |     60
 10 |   2 |    20 |     80
 11 |   2 |    10 |     90
(11 rows)


Notice our cumsum resets on a new day, but it persists past your 60. We need to fix that, for that we simply look at the prior row, and divide by 60. We don't want to start a new group, unless the last row exceeds a multiple of 60.

SELECT *, lag(cumsum,1,0::bigint) OVER (PARTITION BY day ORDER BY id)/60 AS grp
FROM (
  SELECT
    id,
    day,
    price,
    sum(price) OVER (PARTITION BY day ORDER BY id) AS cumsum
  FROM price
) AS t;
 id | day | price | cumsum | grp 
----+-----+-------+--------+-----
  1 |   1 |    40 |     40 |   0
  2 |   1 |    20 |     60 |   0
  3 |   1 |    50 |    110 |   1
  4 |   1 |    10 |    120 |   1
  5 |   1 |    20 |    140 |   2
  6 |   1 |    60 |    200 |   2
  7 |   2 |    10 |     10 |   0
  8 |   2 |    40 |     50 |   0
  9 |   2 |    10 |     60 |   0
 10 |   2 |    20 |     80 |   1
 11 |   2 |    10 |     90 |   1
(11 rows)


And you're done.

Code Snippets

SELECT *, lag(cumsum,1,0::bigint) OVER (PARTITION BY day ORDER BY id)/60 AS grp
FROM (
  SELECT
    id,
    day,
    price,
    sum(price) OVER (PARTITION BY day ORDER BY id) AS cumsum
  FROM price
) AS t;
SELECT
  id,
  day,
  price,
  sum(price) OVER (PARTITION BY day ORDER BY id) AS cumsum
FROM price;

 id | day | price | cumsum 
----+-----+-------+--------
  1 |   1 |    40 |     40
  2 |   1 |    20 |     60
  3 |   1 |    50 |    110
  4 |   1 |    10 |    120
  5 |   1 |    20 |    140
  6 |   1 |    60 |    200
  7 |   2 |    10 |     10
  8 |   2 |    40 |     50
  9 |   2 |    10 |     60
 10 |   2 |    20 |     80
 11 |   2 |    10 |     90
(11 rows)
SELECT *, lag(cumsum,1,0::bigint) OVER (PARTITION BY day ORDER BY id)/60 AS grp
FROM (
  SELECT
    id,
    day,
    price,
    sum(price) OVER (PARTITION BY day ORDER BY id) AS cumsum
  FROM price
) AS t;
 id | day | price | cumsum | grp 
----+-----+-------+--------+-----
  1 |   1 |    40 |     40 |   0
  2 |   1 |    20 |     60 |   0
  3 |   1 |    50 |    110 |   1
  4 |   1 |    10 |    120 |   1
  5 |   1 |    20 |    140 |   2
  6 |   1 |    60 |    200 |   2
  7 |   2 |    10 |     10 |   0
  8 |   2 |    40 |     50 |   0
  9 |   2 |    10 |     60 |   0
 10 |   2 |    20 |     80 |   1
 11 |   2 |    10 |     90 |   1
(11 rows)

Context

StackExchange Database Administrators Q#212190, answer score: 2

Revisions (0)

No revisions yet.