patternsqlMinor
Pivot time series into date period columns with array aggregate cells
Viewed 0 times
periodcolumnswithintopivotarraytimedatecellsseries
Problem
Given a time series of events where each event has a successful or unsuccessful outcome, how do I pivot the ratio of events by entity and time period columns with an aggregate array cell value? I suspect this can be done with
Akin to uptime status reports, I'm trying to calculate something like this in SQL:
The volume of data is small enough that I could do the reduction client-side in a general purpose language, but it would be nice to do this efficiently in the database for a larger amount of data.
Time Series Example
Desired Aggregate
Where each value cell after the key column is an array with shape
To keep things simple, this report will never need more than 10 date window columns and I can dynamically template the SQL pivot output.
If I had to decompose this transformation:
crosstab query and array_agg.Akin to uptime status reports, I'm trying to calculate something like this in SQL:
The volume of data is small enough that I could do the reduction client-side in a general purpose language, but it would be nice to do this efficiently in the database for a larger amount of data.
Time Series Example
+-----------------------------------------+
| DateTime Entity Result |
+-----------------------------------------+
| 2016-01-01 11:00... :a :success |
| 2016-01-01 17:00... :a :success | -- two events for :a on same day
| 2016-01-01 11:01... :b :fail |
| 2016-01-01 11:03... :c :success |
| 2016-01-01 13:00... :d :success | -- only one event for :d
| 2016-01-02 11:00... :a :success |
| 2016-01-02 11:01... :b :fail |
| 2016-01-02 11:03... :c :success |
| ... |
+-----------------------------------------+Desired Aggregate
Where each value cell after the key column is an array with shape
[cnt_total cnt_success cnt_fail]:+-----------------------------------------------+
| Entity 2016-01-01 2016-01-02 2016-01-xx |
+-----------------------------------------------+
| :a [2 2 0] [1 1 0] ... |
| :b [1 0 1] [1 0 1] ... |
| :c [1 1 0] [1 0 1] ... |
| :d [1 1 0] [0 0 0] ... |
+-----------------------------------------------+To keep things simple, this report will never need more than 10 date window columns and I can dynamically template the SQL pivot output.
If I had to decompose this transformation:
- Aggregate time series by date window (hour/day/week/month/quarter/year) and Resu
Solution
This question is old, but you haven’t accepted any answer yet, so I will add another one.
You need some aggregation of your data, and a pivot table. The most elegant way to do the former is by means of a CTE, and the most elegant way to do the latter is with
The general problem of pivots is that, in pure SQL, you cannot define a query whose result has a variable number of columns, and you cannot define column headings dynamically. If you want that, you have to build your query with a procedural language, either server-side (
First, the initial data. I started from the ones joanolo used, but my approach is different, because I use
You only really need an array of two integers (
Solution 1
This is inefficient ad rather ugly, but it’s worth being shown to illustrate how to use a CTE and what pains we had to suffer before
Solution 2
The code for the view is essentially the CTE code of the previous example, but it is simpler, because
This is the main query. When changing intervals, it has to be modified in three places: initial day, final day, and output columns. Formatting is best done client-side, but in this case I have done it server-side. Instructions on ho
You need some aggregation of your data, and a pivot table. The most elegant way to do the former is by means of a CTE, and the most elegant way to do the latter is with
CROSSTAB; however, as of Postgres 9.6, and unlike in other DBMS, you cannot reference a CTE from CROSSTAB. I will show an example for each of the two possible ways out: 1) Use a CTE, and reimplement a poor man’s pivoting on your own 2) Instead of a CTE, create a view once for all and reference it in your CROSSTAB queries. In both cases you’ll have to issue only one query per report, and you won’t have to create any temporary table.The general problem of pivots is that, in pure SQL, you cannot define a query whose result has a variable number of columns, and you cannot define column headings dynamically. If you want that, you have to build your query with a procedural language, either server-side (
plpgsql, as in Abelisto’s answer) or client-side (PHP, java, whatever...). My examples below are in pure SQL, so they have a fixed number of days (three, as in your example data), with fixed column headings ("day 1", "day 2", "day 3"), but they are built in a way that minimizes the needed edits when you change them.First, the initial data. I started from the ones joanolo used, but my approach is different, because I use
SMALLINT instead of BOOLEAN for result. My reason for doing this will become clear in the following. CREATE TABLE time_series (
date_time TIMESTAMP NOT NULL,
entity TEXT NOT NULL,
result SMALLINT DEFAULT 0 -- 1 means success, 0 failure.
);
INSERT INTO time_series VALUES
('2016-01-01 11:00', 'a', 1),
('2016-01-01 17:00', 'a', 1),
('2016-01-01 11:01', 'b', 0),
('2016-01-01 11:03', 'c', 1),
('2016-01-01 13:00', 'd', 1),
('2016-01-02 11:00', 'a', 1),
('2016-01-02 11:01', 'b', 0),
('2016-01-03 11:03', 'e', 1),
('2016-01-03 11:04', 'e', 1),
('2016-01-03 11:05', 'e', 1),
('2016-01-03 11:06', 'e', 0);You only really need an array of two integers (
a in my examples): a[1] (total count) and a[2] (success count). The failure count is simply a[1] - a[2] and the success percentage is 100*(a[2]::float)/a[1]. You can compute the total count by COUNT(result); if you define result SMALLINT you can simply use SUM(result) to keep track of the success count. If you store result as BOOLEAN, you have to use SUM(CASE WHEN result THEN 1 ELSE 0 END). If you store them as strings, SUM(CASE WHEN result = 'success' THEN 1 ELSE 0 END). If you cannot change your time_series table, edit the code below as appropriate.Solution 1
This is inefficient ad rather ugly, but it’s worth being shown to illustrate how to use a CTE and what pains we had to suffer before
CROSSTAB came along. When changing intervals, it has to be modified in four places: initial day, final day, rows in the main selection list, and rows in the list of joined tables. However, using a numeric column rn allows not to write explicit dates in the joined tables, which simplifies the task. WITH ct AS (
SELECT EXTRACT('days' FROM day - MIN(day) OVER()) + 1 AS rn, sub.*
FROM (
SELECT
entity,
DATE_TRUNC('day', date_time) AS day,
ARRAY[COUNT(result), SUM(result)] AS a
FROM time_series
WHERE date_time BETWEEN TIMESTAMP '2016-01-01' -- initial day
AND TIMESTAMP '2016-01-03 23:59:59' -- last second of final day
GROUP BY 1,2
) AS sub
)
SELECT e.entity
, d1.a AS "day 1" -- add as many as you need
, d2.a AS "day 2"
, d3.a AS "day 3"
FROM (SELECT DISTINCT entity FROM ct) e
LEFT JOIN (SELECT entity, a FROM ct WHERE rn = 1) d1 USING(entity) -- add as many as you need
LEFT JOIN (SELECT entity, a FROM ct WHERE rn = 2) d2 USING(entity)
LEFT JOIN (SELECT entity, a FROM ct WHERE rn = 3) d3 USING(entity)
ORDER BY e.entity;
entity | day 1 | day 2 | day 3
--------+-------+-------+-------
a | {2,2} | {1,1} |
b | {1,0} | {1,0} |
c | {1,1} | |
d | {1,1} | |
e | | | {4,3}Solution 2
The code for the view is essentially the CTE code of the previous example, but it is simpler, because
CROSSTAB allows us using GENERATE_SERIES with timestamp values, so we don’t need a numeric rn column to categorize the data. Note that this view, once created, will not need modifications.CREATE VIEW ts_view AS
SELECT
entity,
DATE_TRUNC('day', date_time) AS day,
ARRAY[COUNT(result), SUM(result)] AS a
FROM time_series
GROUP BY 1,2;This is the main query. When changing intervals, it has to be modified in three places: initial day, final day, and output columns. Formatting is best done client-side, but in this case I have done it server-side. Instructions on ho
Code Snippets
CREATE TABLE time_series (
date_time TIMESTAMP NOT NULL,
entity TEXT NOT NULL,
result SMALLINT DEFAULT 0 -- 1 means success, 0 failure.
);
INSERT INTO time_series VALUES
('2016-01-01 11:00', 'a', 1),
('2016-01-01 17:00', 'a', 1),
('2016-01-01 11:01', 'b', 0),
('2016-01-01 11:03', 'c', 1),
('2016-01-01 13:00', 'd', 1),
('2016-01-02 11:00', 'a', 1),
('2016-01-02 11:01', 'b', 0),
('2016-01-03 11:03', 'e', 1),
('2016-01-03 11:04', 'e', 1),
('2016-01-03 11:05', 'e', 1),
('2016-01-03 11:06', 'e', 0);WITH ct AS (
SELECT EXTRACT('days' FROM day - MIN(day) OVER()) + 1 AS rn, sub.*
FROM (
SELECT
entity,
DATE_TRUNC('day', date_time) AS day,
ARRAY[COUNT(result), SUM(result)] AS a
FROM time_series
WHERE date_time BETWEEN TIMESTAMP '2016-01-01' -- initial day
AND TIMESTAMP '2016-01-03 23:59:59' -- last second of final day
GROUP BY 1,2
) AS sub
)
SELECT e.entity
, d1.a AS "day 1" -- add as many as you need
, d2.a AS "day 2"
, d3.a AS "day 3"
FROM (SELECT DISTINCT entity FROM ct) e
LEFT JOIN (SELECT entity, a FROM ct WHERE rn = 1) d1 USING(entity) -- add as many as you need
LEFT JOIN (SELECT entity, a FROM ct WHERE rn = 2) d2 USING(entity)
LEFT JOIN (SELECT entity, a FROM ct WHERE rn = 3) d3 USING(entity)
ORDER BY e.entity;
entity | day 1 | day 2 | day 3
--------+-------+-------+-------
a | {2,2} | {1,1} |
b | {1,0} | {1,0} |
c | {1,1} | |
d | {1,1} | |
e | | | {4,3}CREATE VIEW ts_view AS
SELECT
entity,
DATE_TRUNC('day', date_time) AS day,
ARRAY[COUNT(result), SUM(result)] AS a
FROM time_series
GROUP BY 1,2;SELECT * FROM CROSSTAB ($$
SELECT
entity,
day,
-- You have to repeat the result type of the following expression
-- as the type of the "day N" columns below.
-- e.g. a --> INTEGER[] , 100*(a[2]::FLOAT)/a[1] --> FLOAT , etc.
-- In TO_CHAR, D is changed to your locale's decimal point
TO_CHAR(100*(a[2]::float)/a[1], '990D99')||'%'
FROM ts_view ORDER BY 1
$$,$$
SELECT GENERATE_SERIES (
TIMESTAMP '2016-01-01', -- initial day
TIMESTAMP '2016-01-03', -- final day
'1 day'
)
$$
) AS (
entity TEXT
, "day 1" TEXT -- add as many as you need
, "day 2" TEXT
, "day 3" TEXT
);
entity | day 1 | day 2 | day 3
--------+----------+----------+----------
a | 100.00% | 100.00% |
b | 0.00% | 0.00% |
c | 100.00% | |
d | 100.00% | |
e | | | 75.00%$$
WITH n(ow) AS (VALUES(DATE_TRUNC('day', NOW())))
SELECT GENERATE_SERIES(n.ow + '-2 days', n.ow, '1 day') FROM n
$$Context
StackExchange Database Administrators Q#163213, answer score: 4
Revisions (0)
No revisions yet.