patternsqlMinor
Transformation of row data based on preceding row for special cumulative sum
Viewed 0 times
precedingcumulativespecialforbasedsumtransformationdatarow
Problem
Following the answer to this question, I have managed to produce the following output to get a running sum of values:
BUT my case is more complex. Not only do I need to sum up the values, I need to be able to perform a conversion first to some rows based on the running sum of the row right beneath it.
Let me first explain the motivation:
Currently I have a table with incremental, decremental and override operations. I would like to port the data to a table with only incremental and decremental operations such that I would be able to straightforwardly sum up the values. I am not looking to maintain the old table, simply a way to migrate the data into a simpler model and henceforward to append data to the new table only.
Taken the "raw" table above, I would like to write a query (I am running on postgresql 9.5) and get a table as closely resembling the below. (Conversely, I would like to know that what I am attempting is impossible)
Note the override operators are interspersed between the normal operators, they may appear more than twice as in the example, also, all initial operators (the earliest in the table) are override with an initial value that should be taken into account as in the example below.
id creation operation value running sum
SyJw-c 2016-09-01 00:11:08.307419 positive_op_1 1.33 28.82
SyJw-c 2016-08-21 08:32:54.431662 negative_op_1 -1 27.49
SyJw-c 2016-08-18 07:38:33.878365 positive_op_2 1 28.49
SyJw-c 2016-08-14 18:12:03.599797 negative_op_1 -1 27.49
SyJw-c 2016-08-02 15:44:29.693303 positive_op_1 1.33 28.49
SyJw-c 2016-07-31 12:08:50.659905 override_op_1 4.66 27.16
SyJw-c 2016-06-26 06:53:54.537603 negative_op_1 -3.5 22.5
SyJw-c 2016-05-31 13:34:08.005687 negative_op_1 -1 26
SyJw-c 2016-05-31 13:34:04.776970 negative_op_1 -1 27
SyJw-c 2016-05-31 11:27:09.502983 override_op_2 28 28BUT my case is more complex. Not only do I need to sum up the values, I need to be able to perform a conversion first to some rows based on the running sum of the row right beneath it.
Let me first explain the motivation:
Currently I have a table with incremental, decremental and override operations. I would like to port the data to a table with only incremental and decremental operations such that I would be able to straightforwardly sum up the values. I am not looking to maintain the old table, simply a way to migrate the data into a simpler model and henceforward to append data to the new table only.
Taken the "raw" table above, I would like to write a query (I am running on postgresql 9.5) and get a table as closely resembling the below. (Conversely, I would like to know that what I am attempting is impossible)
Note the override operators are interspersed between the normal operators, they may appear more than twice as in the example, also, all initial operators (the earliest in the table) are override with an initial value that should be taken into account as in the example below.
Solution
Based on this table definition:
Data types and constraints are almost always essential.
(
Basic query to compute your special running sum
Any operation name starting with 'override_op_' indicates the start of a new run (group, patch, partition).
In addition to the related answer you already linked to:
Consider this related question for details how to partition rows into groups (
I use the new aggregate
You could use the simpler (less clear) expression in older versions:
Compute delta
Based on this, you can easily compute the desired delta:
I use the 3-parameter form of the window function
CREATE TABLE tbl ( -- no PK?
id text NOT NULL
, creation timestamp UNIQUE NOT NULL
, operation text NOT NULL
, value numeric NOT NULL
, running_sum numeric -- optional (not needed for task)
);Data types and constraints are almost always essential.
(
creationdoes not strictly have to be unique. But if there can be duplicate values per group (id), you need to do more.)Basic query to compute your special running sum
SELECT id, creation, operation, value
, sum(value) OVER (PARTITION BY id, run ORDER BY creation) AS running_sum
FROM (
SELECT *, count(*) FILTER (WHERE operation LIKE 'override_op_%')
OVER (PARTITION BY id ORDER BY creation) AS run
FROM tbl
) t
ORDER BY id, creation DESC;Any operation name starting with 'override_op_' indicates the start of a new run (group, patch, partition).
In addition to the related answer you already linked to:
- Calculating Cumulative Sum in PostgreSQL
Consider this related question for details how to partition rows into groups (
run in this query, since you are using the term "group" for the id column)- Select longest continuous sequence
I use the new aggregate
FILTER clause for the partial count:- Return counts for multiple ranges in a single SELECT statement
You could use the simpler (less clear) expression in older versions:
count(operation LIKE 'override_op_%' OR NULL)Compute delta
Based on this, you can easily compute the desired delta:
SELECT *
, running_sum - lag(running_sum, 1, numeric '0') -- data type must match!
OVER (PARTITION BY id ORDER BY creation) AS transformed_value
FROM (
SELECT id, creation, operation, value
, sum(value) OVER (PARTITION BY id, run ORDER BY creation) AS running_sum
FROM (
SELECT *, count(*) FILTER (WHERE operation LIKE 'override_op_%')
OVER (PARTITION BY id ORDER BY creation) AS run
FROM tbl
) t
) t
ORDER BY id, creation DESC;I use the 3-parameter form of the window function
lag() to provide 0 (data type must match!) as default value for the first row in the table.Code Snippets
CREATE TABLE tbl ( -- no PK?
id text NOT NULL
, creation timestamp UNIQUE NOT NULL
, operation text NOT NULL
, value numeric NOT NULL
, running_sum numeric -- optional (not needed for task)
);SELECT id, creation, operation, value
, sum(value) OVER (PARTITION BY id, run ORDER BY creation) AS running_sum
FROM (
SELECT *, count(*) FILTER (WHERE operation LIKE 'override_op_%')
OVER (PARTITION BY id ORDER BY creation) AS run
FROM tbl
) t
ORDER BY id, creation DESC;count(operation LIKE 'override_op_%' OR NULL)SELECT *
, running_sum - lag(running_sum, 1, numeric '0') -- data type must match!
OVER (PARTITION BY id ORDER BY creation) AS transformed_value
FROM (
SELECT id, creation, operation, value
, sum(value) OVER (PARTITION BY id, run ORDER BY creation) AS running_sum
FROM (
SELECT *, count(*) FILTER (WHERE operation LIKE 'override_op_%')
OVER (PARTITION BY id ORDER BY creation) AS run
FROM tbl
) t
) t
ORDER BY id, creation DESC;Context
StackExchange Database Administrators Q#149333, answer score: 5
Revisions (0)
No revisions yet.