HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlMinor

How to calculate the cumulative difference between rows in Postgres?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowsthepostgrescumulativedifferencebetweencalculatehow

Problem

I am in need of calculating the difference from the cumulative values for the previous day in Postgres based on session_id. Example as mentioned below:

CREATE TEMP TABLE foo AS
SELECT date::date, session_id, upload_usage, download_usage, total_usage_on_a_day
FROM ( VALUES
  ( '10/21/2014', '0007994b', 37578561   , 6800209   , 44378770 ),
  ( '10/22/2014', '0007994b', 218113296  , 85272007  , 303385303 ),
  ( '10/23/2014', '0007994b', 552228616  , 252390680 , 804619296 ) ,
  ( '10/24/2014', '0007994b', 799772020  , 391196041 , 1190968061 ),
  ( '10/25/2014', '0007994b', 1047233978 , 529908804 , 1577142782 ),
  ( '10/26/2014', '0007994b', 1294608258 , 668515778 , 1963124036 ),
  ( '10/27/2014', '0007994b', 1066656794 , 557318645 , 2573613674 ),
  ( '10/27/2014', '00079e4e', 12949219   , 7265243   , 20214462 ),
  ( '10/28/2014', '00079e4e', 203871297  , 114308478 , 318179775 ),
  ( '10/29/2014', '00079e4e', 445466682  , 251486943 , 696953625 ),
  ( '10/30/2014', '00079e4e', 183499477  , 109643736 , 893143213 )
) AS t( date, session_id, upload_usage, download_usage, total_usage_on_a_day );


Expected Results:

```
Date session_id upload_usage download_usage total_usage_on_a_day Extected_difference
10/21/2014 0007994b 37578561 6800209 44378770 44378770
10/22/2014 0007994b 218113296 85272007 303385303 259006533
10/23/2014 0007994b 552228616 252390680 804619296 501233993
10/24/2014 0007994b 799772020 391196041 1190968061 386348765
10/25/2014 0007994b 1047233978 529908804 1577142782 386174721
10/26/2014 0007994b 1294608258 668515778 1963124036 385981254
10/27/2014 0007994b 1066656794 557318645 2573613674 610489638
10/27/2014 00079e4e 12949219 7265243 20214462 20214462
10/28/2014 00079e4e 203871297 114308

Solution

You can use LAG() function in a query with GROUP BY just like any other query. The only difference is that the columns allowed in the window (OVER) and in the LAG are the ones allowed in SELECT after a GROUP BY:

select 
    date,
    session_id, 
    sum(upload_usage) as upload,
    sum(download_usage) as download,
    sum(total_usage_on_a_day) as total_as_cumm,
    sum(total_usage_on_a_day) 
    - coalesce(lag(sum(total_usage_on_a_day)) over (partition by session_id order by date), 0)
        as expected_difference
 from jiodba.s_crc_zda_mon_conn_usage 
 where gpart = '1100043958' 
   and zzaccess_ntwk_id = 'FTTH' 
 group by session_id, date
 order by session_id, date 
 limit 100 ;


  • I'm not sure why you used max(usage) while you name the column total_usage. I changed it to use sum() instead. If you need to use max(usage) there, pick a more appropriate name for that column.

Code Snippets

select 
    date,
    session_id, 
    sum(upload_usage) as upload,
    sum(download_usage) as download,
    sum(total_usage_on_a_day) as total_as_cumm,
    sum(total_usage_on_a_day) 
    - coalesce(lag(sum(total_usage_on_a_day)) over (partition by session_id order by date), 0)
        as expected_difference
 from jiodba.s_crc_zda_mon_conn_usage 
 where gpart = '1100043958' 
   and zzaccess_ntwk_id = 'FTTH' 
 group by session_id, date
 order by session_id, date 
 limit 100 ;

Context

StackExchange Database Administrators Q#84505, answer score: 6

Revisions (0)

No revisions yet.