patternMinor
Cummulative Sum with Reset Value
Viewed 0 times
withcummulativevaluesumreset
Problem
Consider the following table:
For each row, I need to compute the cumulative sum of
This is the result I expect for the above table:
If not for the reset values, I could use a window query:
```
SELECT temp.*,
COALESCE(SUM(val) OVER (PARTITI
ID | GROUP_ID | ORDER_VAL | RESET_VAL | VAL
---+----------+-----------+-----------+-----
1 | 1 | 1 | (null) | 3
2 | 1 | 2 | (null) | 2
3 | 1 | 3 | (null) | 1
4 | 1 | 4 | 4 | 2
5 | 1 | 5 | (null) | 1
6 | 2 | 1 | (null) | 4
7 | 2 | 2 | 2 | 3
8 | 2 | 3 | (null) | 4
9 | 2 | 4 | (null) | 2
10 | 2 | 5 | (null) | 2
11 | 2 | 6 | (null) | 4
12 | 2 | 7 | 14 | 2
13 | 2 | 8 | (null) | 2For each row, I need to compute the cumulative sum of
VAL for all previous rows (ordered by ORDER_VAL and grouped by GROUP_ID), but each time a non-NULL RESET_VAL is encountered, I need to use that value for the sum. The rows that follow also need to build on top of the RESET_VAL instead of using the actual sum. Note that each group can have multiple reset values.This is the result I expect for the above table:
ID | GROUP_ID | ORDER_VAL | RESET_VAL | VAL | CUMSUM
---+----------+-----------+-----------+-----+-------
1 | 1 | 1 | (null) | 3 | 0
2 | 1 | 2 | (null) | 2 | 3
3 | 1 | 3 | (null) | 1 | 5
4 | 1 | 4 | 4 | 2 | 4
5 | 1 | 5 | (null) | 1 | 6
6 | 2 | 1 | (null) | 4 | 0
7 | 2 | 2 | 2 | 3 | 2
8 | 2 | 3 | (null) | 4 | 5
9 | 2 | 4 | (null) | 2 | 9
10 | 2 | 5 | (null) | 2 | 11
11 | 2 | 6 | (null) | 4 | 13
12 | 2 | 7 | 14 | 2 | 14
13 | 2 | 8 | (null) | 2 | 16If not for the reset values, I could use a window query:
```
SELECT temp.*,
COALESCE(SUM(val) OVER (PARTITI
Solution
The following works but there's probably some more clever version. Explanation of the query's logic:
We first find how many "resets" have been done up to and including the current row by counting the not nulls of the
We also use another window function
Note that both these window functions
Assuming that the
(avoiding the
Finally, in the second cte, we use the above found subgroups (using
Test at SQLfiddle.
Another variation, based on @Chris' recursive answer. (slightly improved, works with non-consecutive
Also works in case the first row of a group has a
Test at SQLfiddle-2.
One more variation, using the older (proprietary)
Tested at SQLfiddle-3.
We first find how many "resets" have been done up to and including the current row by counting the not nulls of the
reset_val column, so we can separate the rows into subgroups. We also use another window function
LAST_VALUE() with IGNORE NULLS, so we can find the last reset_value. Note that both these window functions
COUNT() and LAST_VALUE() have an ORDER BY, therefore the default window ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Omitted in the query, to make the code more clear.Assuming that the
val is not nullable, the other window function could also be shortened, from:COALESCE(SUM(val) OVER
(PARTITION BY group_id, reset_count
ORDER BY order_val
ROWS BETWEEN UNBOUNDED PRECEDING
AND 1 PRECEDING), 0)(avoiding the
COALESCE() as well) to:SUM(val) OVER
(PARTITION BY group_id, reset_count
ORDER BY order_val)
- valFinally, in the second cte, we use the above found subgroups (using
PARTITION BY group_id, reset_count) to find the cumulative sums.WITH x AS
( SELECT temp.*,
COUNT(reset_val) OVER
(PARTITION BY group_id
ORDER BY order_val)
AS reset_count,
COALESCE(LAST_VALUE(reset_val IGNORE NULLS) OVER
(PARTITION BY group_id
ORDER BY order_val), 0)
AS reset_value
FROM temp
) ,
y AS
( SELECT x.*,
COALESCE(SUM(val) OVER
(PARTITION BY group_id, reset_count
ORDER BY order_val
ROWS BETWEEN UNBOUNDED PRECEDING
AND 1 PRECEDING), 0)
+ reset_value AS cumsum
FROM x
)
SELECT *
FROM y ;Test at SQLfiddle.
Another variation, based on @Chris' recursive answer. (slightly improved, works with non-consecutive
order_val, avoids the final GROUP BY).Also works in case the first row of a group has a
reset_val:WITH row_nums AS
( SELECT id, group_id, order_val, reset_val, val,
ROW_NUMBER() OVER (PARTITION BY group_id
ORDER BY order_val)
AS rn
FROM temp
) ,
updated_temp (id, group_id, order_val, reset_val, val, rn, cumsum) AS
( SELECT id, group_id, order_val, reset_val, val, rn,
COALESCE(reset_val, 0)
FROM row_nums
WHERE rn = 1
UNION ALL
SELECT curr.id, curr.group_id, curr.order_val, curr.reset_val, curr.val, curr.rn,
COALESCE(curr.reset_val, prev.val + prev.cumsum)
FROM row_nums curr
JOIN updated_temp prev
ON curr.rn-1 = prev.rn
AND curr.group_id = prev.group_id
)
SELECT id, group_id, order_val, reset_val, val, cumsum
FROM updated_temp
ORDER BY group_id, order_val ;Test at SQLfiddle-2.
One more variation, using the older (proprietary)
CONNECT BY syntax for recursive queries. More compact but I find it more difficult to write and read than the CTE version:WITH row_nums AS
( SELECT id, group_id, order_val, reset_val, val,
ROW_NUMBER() OVER (PARTITION BY group_id
ORDER BY order_val)
AS rn,
COALESCE(reset_val, 0) AS cumsum
FROM temp
)
SELECT id, group_id, order_val, reset_val, val, rn,
COALESCE(reset_val, PRIOR val + PRIOR cumsum, 0) AS cumsum
FROM row_nums
START WITH rn = 1 OR reset_val IS NOT NULL
CONNECT BY rn-1 = PRIOR rn
AND group_id = PRIOR group_id
AND reset_val IS NULL
ORDER BY group_id, order_val ;Tested at SQLfiddle-3.
Code Snippets
COALESCE(SUM(val) OVER
(PARTITION BY group_id, reset_count
ORDER BY order_val
ROWS BETWEEN UNBOUNDED PRECEDING
AND 1 PRECEDING), 0)SUM(val) OVER
(PARTITION BY group_id, reset_count
ORDER BY order_val)
- valWITH x AS
( SELECT temp.*,
COUNT(reset_val) OVER
(PARTITION BY group_id
ORDER BY order_val)
AS reset_count,
COALESCE(LAST_VALUE(reset_val IGNORE NULLS) OVER
(PARTITION BY group_id
ORDER BY order_val), 0)
AS reset_value
FROM temp
) ,
y AS
( SELECT x.*,
COALESCE(SUM(val) OVER
(PARTITION BY group_id, reset_count
ORDER BY order_val
ROWS BETWEEN UNBOUNDED PRECEDING
AND 1 PRECEDING), 0)
+ reset_value AS cumsum
FROM x
)
SELECT *
FROM y ;WITH row_nums AS
( SELECT id, group_id, order_val, reset_val, val,
ROW_NUMBER() OVER (PARTITION BY group_id
ORDER BY order_val)
AS rn
FROM temp
) ,
updated_temp (id, group_id, order_val, reset_val, val, rn, cumsum) AS
( SELECT id, group_id, order_val, reset_val, val, rn,
COALESCE(reset_val, 0)
FROM row_nums
WHERE rn = 1
UNION ALL
SELECT curr.id, curr.group_id, curr.order_val, curr.reset_val, curr.val, curr.rn,
COALESCE(curr.reset_val, prev.val + prev.cumsum)
FROM row_nums curr
JOIN updated_temp prev
ON curr.rn-1 = prev.rn
AND curr.group_id = prev.group_id
)
SELECT id, group_id, order_val, reset_val, val, cumsum
FROM updated_temp
ORDER BY group_id, order_val ;WITH row_nums AS
( SELECT id, group_id, order_val, reset_val, val,
ROW_NUMBER() OVER (PARTITION BY group_id
ORDER BY order_val)
AS rn,
COALESCE(reset_val, 0) AS cumsum
FROM temp
)
SELECT id, group_id, order_val, reset_val, val, rn,
COALESCE(reset_val, PRIOR val + PRIOR cumsum, 0) AS cumsum
FROM row_nums
START WITH rn = 1 OR reset_val IS NOT NULL
CONNECT BY rn-1 = PRIOR rn
AND group_id = PRIOR group_id
AND reset_val IS NULL
ORDER BY group_id, order_val ;Context
StackExchange Database Administrators Q#104596, answer score: 4
Revisions (0)
No revisions yet.