patternsqlMinor
Converting rows to columns
Viewed 0 times
convertingrowscolumns
Problem
I have data like following:
created_at | status
---------------------+-------------
2016-04-05 1:27:15 | info
2016-04-05 3:27:15 | info
2016-04-05 5:27:15 | warn
2016-04-05 10:27:15 | info
2016-04-05 11:27:15 | warn
With this data, I want to convert like as follows:
status | 2016-04-05 1:00:00 | 2016-04-05 4:00:00 | 2016-04-05 8:00:00 | 2016-04-05 12:00:00
---------+--------------------+--------------------+--------------------+-------------------
info | 1 | 1 | 0 | 1
warn | 0 | 0 | 1 | 1
Can anyone suggest the best way to do this?
created_at | status
---------------------+-------------
2016-04-05 1:27:15 | info
2016-04-05 3:27:15 | info
2016-04-05 5:27:15 | warn
2016-04-05 10:27:15 | info
2016-04-05 11:27:15 | warn
With this data, I want to convert like as follows:
status | 2016-04-05 1:00:00 | 2016-04-05 4:00:00 | 2016-04-05 8:00:00 | 2016-04-05 12:00:00
---------+--------------------+--------------------+--------------------+-------------------
info | 1 | 1 | 0 | 1
warn | 0 | 0 | 1 | 1
Can anyone suggest the best way to do this?
Solution
Assuming
The logic would be to count events that happened up to and excluding the next bound. This fits the often overlooked function
return the bucket number to which
array listing the lower bounds of the buckets; returns
less than the first lower bound; the
smallest first, or unexpected results will be obtained
For regular buckets you can use another variant that's available in Postgres 9.1 as well.
Combine it with
Result:
The second crosstab parameter (
Basics for
Replace NULL with 0
If you need
Result:
Adding totals
To add totals per
Result like above, plus:
Note that
This is in reply to @Vérace's request in the comments rather than to the unclear question.
2016-04-05 0:27:15 instead of 2016-04-05 1:27:15 in the underlying table, the question would make more sense to me:CREATE TABLE tbl (created_at timestamp, status text);
INSERT INTO tbl VALUES
('2016-04-05 00:27:15', 'info')
, ('2016-04-05 03:27:15', 'info')
, ('2016-04-05 05:27:15', 'warn')
, ('2016-04-05 10:27:15', 'info')
, ('2016-04-05 11:27:15', 'warn');The logic would be to count events that happened up to and excluding the next bound. This fits the often overlooked function
width_bucket() perfectly. To be precise, it requires the variant with arbitrary bounds (since there is no regular pattern in the OP's bounds) introduced with Postgres 9.5. Explanation straight from the manual:width_bucket(operand anyelement, thresholds anyarray)return the bucket number to which
operand would be assigned given anarray listing the lower bounds of the buckets; returns
0 for an inputless than the first lower bound; the
thresholds array must be sorted,smallest first, or unexpected results will be obtained
For regular buckets you can use another variant that's available in Postgres 9.1 as well.
Combine it with
crosstab() re-using the same bounds as column names (the rest of the query works with Postgres 9.1):SELECT * FROM crosstab(
$SELECT status
, width_bucket(created_at, '{2016-04-05 01:00
, 2016-04-05 04:00
, 2016-04-05 08:00
, 2016-04-05 12:00}'::timestamp[])
, count(*)::int
FROM tbl
WHERE created_at < '2016-04-05 12:00' -- exclude later rows
GROUP BY 1, 2
ORDER BY 1, 2$
, 'SELECT generate_series(0,3)'
) AS t(status text, "2016-04-05 01:00" int
, "2016-04-05 04:00" int
, "2016-04-05 08:00" int
, "2016-04-05 12:00" int);Result:
status | 2016-04-05 01:00 | 2016-04-05 04:00 | 2016-04-05 08:00 | 2016-04-05 12:00
--------+------------------+------------------+------------------+------------------
info | 1 | 1 | | 1
warn | | | 1 | 1The second crosstab parameter (
'SELECT generate_series(0,3)') is a query string when executed returning one row for every target column. Every value not found on either side - not in the raw data or not generated by the 2nd parameter - is simply ignored.Basics for
crosstab():- PostgreSQL Crosstab Query
Replace NULL with 0
If you need
0 instead of NULL in the result, fix with COALESCE(), but that's merely a cosmetic problem:SELECT status
, COALESCE(t0, 0) AS "2016-04-05 01:00"
, COALESCE(t1, 0) AS "2016-04-05 04:00"
, COALESCE(t2, 0) AS "2016-04-05 08:00"
, COALESCE(t3, 0) AS "2016-04-05 12:00"
FROM crosstab(
$SELECT status
, width_bucket(created_at, '{2016-04-05 01:00
, 2016-04-05 04:00
, 2016-04-05 08:00
, 2016-04-05 12:00}'::timestamp[])
, count(*)::int
FROM tbl
WHERE created_at < '2016-04-05 12:00'
GROUP BY 1, 2
ORDER BY 1, 2$
, 'SELECT generate_series(0,3)'
) AS t(status text, t0 int, t1 int, t2 int, t3 int);Result:
status | 2016-04-05 01:00 | 2016-04-05 04:00 | 2016-04-05 08:00 | 2016-04-05 12:00
--------+------------------+------------------+------------------+------------------
info | 1 | 1 | 0 | 1
warn | 0 | 0 | 1 | 1Adding totals
To add totals per
status use the new GROUPING SETS in Postgres 9.5+SELECT status
, COALESCE(t0, 0) AS "2016-04-05 01:00"
, COALESCE(t1, 0) AS "2016-04-05 04:00"
, COALESCE(t2, 0) AS "2016-04-05 08:00"
, COALESCE(t3, 0) AS "2016-04-05 12:00"
, COALESCE(t4, 0) AS total
FROM crosstab(
$SELECT status, COALESCE(slot, -1), ct -- special slot for totals
FROM (
SELECT status
, width_bucket(created_at, '{2016-04-05 01:00
, 2016-04-05 04:00
, 2016-04-05 08:00
, 2016-04-05 12:00}'::timestamp[]) AS slot
, count(*)::int AS ct
FROM tbl
WHERE created_at GROUP BY GROUPING SETS ((1, 2), 1) -- add totals per status
ORDER BY 1, 2
) sub$
, 'VALUES (0), (1), (2), (3), (-1)' -- switched to VALUES for more sophisticated series
) AS t(status text, t0 int, t1 int, t2 int, t3 int, t4 int);Result like above, plus:
... | total
... -+-------
... | 3
... | 2Note that
total includes all rows not excluded before aggregation, even if filtered by crosstab().This is in reply to @Vérace's request in the comments rather than to the unclear question.
Code Snippets
CREATE TABLE tbl (created_at timestamp, status text);
INSERT INTO tbl VALUES
('2016-04-05 00:27:15', 'info')
, ('2016-04-05 03:27:15', 'info')
, ('2016-04-05 05:27:15', 'warn')
, ('2016-04-05 10:27:15', 'info')
, ('2016-04-05 11:27:15', 'warn');width_bucket(operand anyelement, thresholds anyarray)SELECT * FROM crosstab(
$$SELECT status
, width_bucket(created_at, '{2016-04-05 01:00
, 2016-04-05 04:00
, 2016-04-05 08:00
, 2016-04-05 12:00}'::timestamp[])
, count(*)::int
FROM tbl
WHERE created_at < '2016-04-05 12:00' -- exclude later rows
GROUP BY 1, 2
ORDER BY 1, 2$$
, 'SELECT generate_series(0,3)'
) AS t(status text, "2016-04-05 01:00" int
, "2016-04-05 04:00" int
, "2016-04-05 08:00" int
, "2016-04-05 12:00" int);status | 2016-04-05 01:00 | 2016-04-05 04:00 | 2016-04-05 08:00 | 2016-04-05 12:00
--------+------------------+------------------+------------------+------------------
info | 1 | 1 | | 1
warn | | | 1 | 1SELECT status
, COALESCE(t0, 0) AS "2016-04-05 01:00"
, COALESCE(t1, 0) AS "2016-04-05 04:00"
, COALESCE(t2, 0) AS "2016-04-05 08:00"
, COALESCE(t3, 0) AS "2016-04-05 12:00"
FROM crosstab(
$$SELECT status
, width_bucket(created_at, '{2016-04-05 01:00
, 2016-04-05 04:00
, 2016-04-05 08:00
, 2016-04-05 12:00}'::timestamp[])
, count(*)::int
FROM tbl
WHERE created_at < '2016-04-05 12:00'
GROUP BY 1, 2
ORDER BY 1, 2$$
, 'SELECT generate_series(0,3)'
) AS t(status text, t0 int, t1 int, t2 int, t3 int);Context
StackExchange Database Administrators Q#134874, answer score: 8
Revisions (0)
No revisions yet.