HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Reset Running Total based on another column

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
totalcolumnrunninganotherbasedreset

Problem

Am trying to calculate running total. But it should reset when the cummulative sum greater than another column value

create table #reset_runn_total
(
id int identity(1,1),
val int, 
reset_val int,
grp int
)

insert into #reset_runn_total
values 
(1,10,1),
(8,12,1),(6,14,1),(5,10,1),(6,13,1),(3,11,1),(9,8,1),(10,12,1)

SELECT Row_number()OVER(partition BY grp ORDER BY id)AS rn,*
INTO   #test
FROM   #reset_runn_total


Index details:

CREATE UNIQUE CLUSTERED INDEX ix_load_reset_runn_total
  ON #test(rn, grp)


sample data

+----+-----+-----------+-----+
| id | val | reset_val | Grp |
+----+-----+-----------+-----+
|  1 |   1 |        10 | 1   |
|  2 |   8 |        12 | 1   |
|  3 |   6 |        14 | 1   |
|  4 |   5 |        10 | 1   |
|  5 |   6 |        13 | 1   |
|  6 |   3 |        11 | 1   |
|  7 |   9 |         8 | 1   |
|  8 |  10 |        12 | 1   |
+----+-----+-----------+-----+


Expected result

+----+-----+-----------------+-------------+
| id | val |    reset_val    | Running_tot |
+----+-----+-----------------+-------------+
|  1 |   1 | 10              |       1     |  
|  2 |   8 | 12              |       9     |  --1+8
|  3 |   6 | 14              |       15    |  --1+8+6 -- greater than reset val
|  4 |   5 | 10              |       5     |  --reset 
|  5 |   6 | 13              |       11    |  --5+6
|  6 |   3 | 11              |       14    |  --5+6+3 -- greater than reset val
|  7 |   9 | 8               |       9     |  --reset -- greater than reset val 
|  8 |  10 | 12              |      10     |  --reset
+----+-----+-----------------+-------------+


Query:

I got the result using Recursive CTE. Original question is here https://stackoverflow.com/questions/42085404/reset-running-total-based-on-another-column

```
;WITH cte
AS (SELECT rn,id,
val,
reset_val,
grp,
val AS running_total,
Iif (val > reset_val, 1, 0)

Solution

I've looked at similar problems and have never been able to find a window function solution that does a single pass over the data. I don't think it's possible. Window functions need to be able to be applied to all of the values in a column. That makes reset calculations like this very difficult, because one reset changes the value for all of the following values.

One way to think about the problem is that you can get the end result you want if you calculate a basic running total as long as you can subtract the running total from the correct previous row. For example, in your sample data the value for id 4 is the running total of row 4 - the running total of row 3. The value for id 6 is the running total of row 6 - the running total of row 3 because a reset hasn't happened yet. The value for id 7 is the running total of row 7 - the running total of row 6 and so on.

I would approach this with T-SQL in a loop. I got a little carried away and think I have a full solution. For 3 million rows and 500 groups the code finished in 24 seconds on my desktop. I'm testing with SQL Server 2016 Developer edition with 6 vCPU. I'm taking advantage of parallel inserts and parallel execution in general so you may need to change the code if you're on an older version or have DOP limitations.

Below the code that I used to generate the data. The ranges on VAL and RESET_VAL should be similar to your sample data.

drop table if exists reset_runn_total;

create table reset_runn_total
(
id int identity(1,1),
val int, 
reset_val int,
grp int
);

DECLARE 
@group_num INT,
@row_num INT;
BEGIN
    SET NOCOUNT ON;
    BEGIN TRANSACTION;

    SET @group_num = 1;
    WHILE @group_num <= 50000 
    BEGIN
        SET @row_num = 1;
        WHILE @row_num <= 60
        BEGIN
            INSERT INTO reset_runn_total WITH (TABLOCK)
            SELECT 1 + ABS(CHECKSUM(NewId())) % 10, 8 + ABS(CHECKSUM(NewId())) % 8, @group_num;

            SET @row_num = @row_num + 1;
        END;
        SET @group_num = @group_num + 1;
    END;
    COMMIT TRANSACTION;
END;


The algorithm is as follows:

1) Start by inserting all rows with a standard running total into a temp table.

2) In a loop:

2a) For each group, calculate the first row with a running total above the reset_value remaining in the table and store the id, the running total that was too large, and the previous running total that was too large in a temp table.

2b) Delete rows from the first temp table into a results temp table that have an ID less than or equal to the ID in the second temp table. Use the other columns to adjust the running total as needed.

3) After the delete no longer processes rows run an additional DELETE OUTPUT into the results table. This is for rows at the end of the group that never exceed the reset value.

I'll go through one implementation of the above algorithm in T-SQL step by step.

Start by creating a few temp tables. #initial_results holds the original data with the standard running total, #group_bookkeeping is updated each loop to figure out which rows can be moved, and #final_results contains the results with the running total adjusted for resets.

CREATE TABLE #initial_results (
id int,
val int, 
reset_val int,
grp int,
initial_running_total int
);

CREATE TABLE #group_bookkeeping (
grp int,
max_id_to_move int,
running_total_to_subtract_this_loop int,
running_total_to_subtract_next_loop int,
grp_done bit, 
PRIMARY KEY (grp)
);

CREATE TABLE #final_results (
id int,
val int, 
reset_val int,
grp int,
running_total int
);

INSERT INTO #initial_results WITH (TABLOCK)
SELECT ID, VAL, RESET_VAL, GRP, SUM(VAL) OVER (PARTITION BY GRP ORDER BY ID) RUNNING_TOTAL
FROM reset_runn_total;

CREATE CLUSTERED INDEX i1 ON #initial_results (grp, id);

INSERT INTO #group_bookkeeping WITH (TABLOCK)
SELECT DISTINCT GRP, 0, 0, 0, 0
FROM reset_runn_total;


I create the clustered index on the temp table after so the insert and the index build can be done in parallel. Made a big difference on my machine but may not on yours. Creating an index on the source table didn't seem to help but that could help on your machine.

The code below runs in the loop and updates the bookkeeping table. For each group, we need to get the find the maximum ID which should be moved into the results table. We need the running total from that row so we can subtract it from the initial running total. The grp_done column is set to 1 when there isn't any more work to do for a grp.

```
WITH UPD_CTE AS (
SELECT
#grp_bookkeeping.GRP
, MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN ID ELSE NULL END) max_id_to_update
, MIN(#group_bookkeeping.running_total_to_subtract_next_loop) running_total_to_subtract_this_loop
, MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN initial_running_total ELSE NULL END) add

Code Snippets

drop table if exists reset_runn_total;

create table reset_runn_total
(
id int identity(1,1),
val int, 
reset_val int,
grp int
);

DECLARE 
@group_num INT,
@row_num INT;
BEGIN
    SET NOCOUNT ON;
    BEGIN TRANSACTION;

    SET @group_num = 1;
    WHILE @group_num <= 50000 
    BEGIN
        SET @row_num = 1;
        WHILE @row_num <= 60
        BEGIN
            INSERT INTO reset_runn_total WITH (TABLOCK)
            SELECT 1 + ABS(CHECKSUM(NewId())) % 10, 8 + ABS(CHECKSUM(NewId())) % 8, @group_num;

            SET @row_num = @row_num + 1;
        END;
        SET @group_num = @group_num + 1;
    END;
    COMMIT TRANSACTION;
END;
CREATE TABLE #initial_results (
id int,
val int, 
reset_val int,
grp int,
initial_running_total int
);

CREATE TABLE #group_bookkeeping (
grp int,
max_id_to_move int,
running_total_to_subtract_this_loop int,
running_total_to_subtract_next_loop int,
grp_done bit, 
PRIMARY KEY (grp)
);

CREATE TABLE #final_results (
id int,
val int, 
reset_val int,
grp int,
running_total int
);

INSERT INTO #initial_results WITH (TABLOCK)
SELECT ID, VAL, RESET_VAL, GRP, SUM(VAL) OVER (PARTITION BY GRP ORDER BY ID) RUNNING_TOTAL
FROM reset_runn_total;

CREATE CLUSTERED INDEX i1 ON #initial_results (grp, id);

INSERT INTO #group_bookkeeping WITH (TABLOCK)
SELECT DISTINCT GRP, 0, 0, 0, 0
FROM reset_runn_total;
WITH UPD_CTE AS (
        SELECT 
        #grp_bookkeeping.GRP
        , MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN ID ELSE NULL END) max_id_to_update
        , MIN(#group_bookkeeping.running_total_to_subtract_next_loop) running_total_to_subtract_this_loop
        , MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN initial_running_total ELSE NULL END) additional_value_next_loop
        , CASE WHEN MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN ID ELSE NULL END) IS NULL THEN 1 ELSE 0 END grp_done
        FROM #group_bookkeeping 
        INNER JOIN #initial_results IR ON #group_bookkeeping.grp = ir.grp
        WHERE #group_bookkeeping.grp_done = 0
        GROUP BY #group_bookkeeping.GRP
    )
    UPDATE #group_bookkeeping
    SET #group_bookkeeping.max_id_to_move = uv.max_id_to_update
    , #group_bookkeeping.running_total_to_subtract_this_loop = uv.running_total_to_subtract_this_loop
    , #group_bookkeeping.running_total_to_subtract_next_loop = uv.additional_value_next_loop
    , #group_bookkeeping.grp_done = uv.grp_done
    FROM UPD_CTE uv
    WHERE uv.GRP = #group_bookkeeping.grp
OPTION (LOOP JOIN);
DELETE ir
OUTPUT DELETED.id,  
    DELETED.VAL,  
    DELETED.RESET_VAL,  
    DELETED.GRP ,
    DELETED.initial_running_total - tb.running_total_to_subtract_this_loop
INTO #final_results
FROM #initial_results ir
INNER JOIN #group_bookkeeping tb ON ir.GRP = tb.GRP AND ir.ID <= tb.max_id_to_move
WHERE tb.grp_done = 0;
DECLARE @RC INT;
BEGIN
SET NOCOUNT ON;

CREATE TABLE #initial_results (
id int,
val int, 
reset_val int,
grp int,
initial_running_total int
);

CREATE TABLE #group_bookkeeping (
grp int,
max_id_to_move int,
running_total_to_subtract_this_loop int,
running_total_to_subtract_next_loop int,
grp_done bit, 
PRIMARY KEY (grp)
);

CREATE TABLE #final_results (
id int,
val int, 
reset_val int,
grp int,
running_total int
);

INSERT INTO #initial_results WITH (TABLOCK)
SELECT ID, VAL, RESET_VAL, GRP, SUM(VAL) OVER (PARTITION BY GRP ORDER BY ID) RUNNING_TOTAL
FROM reset_runn_total;

CREATE CLUSTERED INDEX i1 ON #initial_results (grp, id);

INSERT INTO #group_bookkeeping WITH (TABLOCK)
SELECT DISTINCT GRP, 0, 0, 0, 0
FROM reset_runn_total;

SET @RC = 1;
WHILE @RC > 0 
BEGIN
    WITH UPD_CTE AS (
        SELECT 
        #group_bookkeeping.GRP
        , MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN ID ELSE NULL END) max_id_to_move
        , MIN(#group_bookkeeping.running_total_to_subtract_next_loop) running_total_to_subtract_this_loop
        , MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN initial_running_total ELSE NULL END) additional_value_next_loop
        , CASE WHEN MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN ID ELSE NULL END) IS NULL THEN 1 ELSE 0 END grp_done
        FROM #group_bookkeeping 
        CROSS APPLY (SELECT ID, RESET_VAL, initial_running_total FROM #initial_results ir WHERE #group_bookkeeping.grp = ir.grp ) ir
        WHERE #group_bookkeeping.grp_done = 0
        GROUP BY #group_bookkeeping.GRP
    )
    UPDATE #group_bookkeeping
    SET #group_bookkeeping.max_id_to_move = uv.max_id_to_move
    , #group_bookkeeping.running_total_to_subtract_this_loop = uv.running_total_to_subtract_this_loop
    , #group_bookkeeping.running_total_to_subtract_next_loop = uv.additional_value_next_loop
    , #group_bookkeeping.grp_done = uv.grp_done
    FROM UPD_CTE uv
    WHERE uv.GRP = #group_bookkeeping.grp
    OPTION (LOOP JOIN);

    DELETE ir
    OUTPUT DELETED.id,  
        DELETED.VAL,  
        DELETED.RESET_VAL,  
        DELETED.GRP ,
        DELETED.initial_running_total - tb.running_total_to_subtract_this_loop
    INTO #final_results
    FROM #initial_results ir
    INNER JOIN #group_bookkeeping tb ON ir.GRP = tb.GRP AND ir.ID <= tb.max_id_to_move
    WHERE tb.grp_done = 0;

    SET @RC = @@ROWCOUNT;
END;

DELETE ir 
OUTPUT DELETED.id,  
    DELETED.VAL,  
    DELETED.RESET_VAL,  
    DELETED.GRP ,
    DELETED.initial_running_total - tb.running_total_to_subtract_this_loop
    INTO #final_results
FROM #initial_results ir
INNER JOIN #group_bookkeeping tb ON ir.GRP = tb.GRP;

CREATE CLUSTERED INDEX f1 ON #final_results (grp, id);

/* -- do something with the data
SELECT *
FROM #final_results
ORDER BY grp, id;
*/

DROP TABLE #final_results;
DROP TABLE #initial_r

Context

StackExchange Database Administrators Q#163557, answer score: 6

Revisions (0)

No revisions yet.