HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Converting rows to columns

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
convertingrowscolumns

Problem

I have data like following:

created_at | status
---------------------+-------------
2016-04-05 1:27:15 | info
2016-04-05 3:27:15 | info
2016-04-05 5:27:15 | warn
2016-04-05 10:27:15 | info
2016-04-05 11:27:15 | warn

With this data, I want to convert like as follows:

status | 2016-04-05 1:00:00 | 2016-04-05 4:00:00 | 2016-04-05 8:00:00 | 2016-04-05 12:00:00
---------+--------------------+--------------------+--------------------+-------------------
info | 1 | 1 | 0 | 1
warn | 0 | 0 | 1 | 1

Can anyone suggest the best way to do this?

Solution

Assuming 2016-04-05 0:27:15 instead of 2016-04-05 1:27:15 in the underlying table, the question would make more sense to me:

CREATE TABLE tbl (created_at timestamp, status text);
INSERT INTO tbl VALUES
  ('2016-04-05 00:27:15', 'info')
, ('2016-04-05 03:27:15', 'info')
, ('2016-04-05 05:27:15', 'warn')
, ('2016-04-05 10:27:15', 'info')
, ('2016-04-05 11:27:15', 'warn');


The logic would be to count events that happened up to and excluding the next bound. This fits the often overlooked function width_bucket() perfectly. To be precise, it requires the variant with arbitrary bounds (since there is no regular pattern in the OP's bounds) introduced with Postgres 9.5. Explanation straight from the manual:

width_bucket(operand anyelement, thresholds anyarray)




return the bucket number to which operand would be assigned given an
array listing the lower bounds of the buckets; returns 0 for an input
less than the first lower bound; the thresholds array must be sorted,
smallest first, or unexpected results will be obtained

For regular buckets you can use another variant that's available in Postgres 9.1 as well.

Combine it with crosstab() re-using the same bounds as column names (the rest of the query works with Postgres 9.1):

SELECT * FROM crosstab(
 $SELECT status
        , width_bucket(created_at, '{2016-04-05 01:00
                                   , 2016-04-05 04:00
                                   , 2016-04-05 08:00
                                   , 2016-04-05 12:00}'::timestamp[])
        , count(*)::int
   FROM   tbl
   WHERE  created_at < '2016-04-05 12:00'  -- exclude later rows
   GROUP  BY 1, 2
   ORDER  BY 1, 2$
, 'SELECT generate_series(0,3)'
   ) AS t(status text, "2016-04-05 01:00" int
                     , "2016-04-05 04:00" int
                     , "2016-04-05 08:00" int
                     , "2016-04-05 12:00" int);


Result:

status | 2016-04-05 01:00 | 2016-04-05 04:00 | 2016-04-05 08:00 | 2016-04-05 12:00
--------+------------------+------------------+------------------+------------------
 info   |                1 |                1 |                  |  1
 warn   |                  |                  |                1 |  1


The second crosstab parameter ('SELECT generate_series(0,3)') is a query string when executed returning one row for every target column. Every value not found on either side - not in the raw data or not generated by the 2nd parameter - is simply ignored.

Basics for crosstab():

  • PostgreSQL Crosstab Query



Replace NULL with 0

If you need 0 instead of NULL in the result, fix with COALESCE(), but that's merely a cosmetic problem:

SELECT status
     , COALESCE(t0, 0) AS "2016-04-05 01:00"
     , COALESCE(t1, 0) AS "2016-04-05 04:00"
     , COALESCE(t2, 0) AS "2016-04-05 08:00"
     , COALESCE(t3, 0) AS "2016-04-05 12:00"
FROM   crosstab(
 $SELECT status
        , width_bucket(created_at, '{2016-04-05 01:00
                                   , 2016-04-05 04:00
                                   , 2016-04-05 08:00
                                   , 2016-04-05 12:00}'::timestamp[])
        , count(*)::int
   FROM   tbl
   WHERE  created_at < '2016-04-05 12:00'
   GROUP  BY 1, 2
   ORDER  BY 1, 2$
, 'SELECT generate_series(0,3)'
   ) AS t(status text, t0 int, t1 int, t2 int, t3 int);


Result:

status | 2016-04-05 01:00 | 2016-04-05 04:00 | 2016-04-05 08:00 | 2016-04-05 12:00
--------+------------------+------------------+------------------+------------------
 info   |                1 |                1 |                0 |  1
 warn   |                0 |                0 |                1 |  1


Adding totals

To add totals per status use the new GROUPING SETS in Postgres 9.5+

SELECT status
     , COALESCE(t0, 0) AS "2016-04-05 01:00"
     , COALESCE(t1, 0) AS "2016-04-05 04:00"
     , COALESCE(t2, 0) AS "2016-04-05 08:00"
     , COALESCE(t3, 0) AS "2016-04-05 12:00"
     , COALESCE(t4, 0) AS total
FROM   crosstab(
 $SELECT status, COALESCE(slot, -1), ct  -- special slot for totals
   FROM  (
      SELECT status
           , width_bucket(created_at, '{2016-04-05 01:00
                                      , 2016-04-05 04:00
                                      , 2016-04-05 08:00
                                      , 2016-04-05 12:00}'::timestamp[]) AS slot
           , count(*)::int AS ct
      FROM   tbl
      WHERE  created_at GROUP  BY GROUPING SETS ((1, 2), 1)  -- add totals per status
      ORDER  BY 1, 2
      ) sub$
 , 'VALUES (0), (1), (2), (3), (-1)'  -- switched to VALUES for more sophisticated series
   ) AS t(status text, t0 int, t1 int, t2 int, t3 int, t4 int);


Result like above, plus:

...  | total
... -+-------
...  |     3
...  |     2


Note that total includes all rows not excluded before aggregation, even if filtered by crosstab().

This is in reply to @Vérace's request in the comments rather than to the unclear question.

Code Snippets

CREATE TABLE tbl (created_at timestamp, status text);
INSERT INTO tbl VALUES
  ('2016-04-05 00:27:15', 'info')
, ('2016-04-05 03:27:15', 'info')
, ('2016-04-05 05:27:15', 'warn')
, ('2016-04-05 10:27:15', 'info')
, ('2016-04-05 11:27:15', 'warn');
width_bucket(operand anyelement, thresholds anyarray)
SELECT * FROM crosstab(
 $$SELECT status
        , width_bucket(created_at, '{2016-04-05 01:00
                                   , 2016-04-05 04:00
                                   , 2016-04-05 08:00
                                   , 2016-04-05 12:00}'::timestamp[])
        , count(*)::int
   FROM   tbl
   WHERE  created_at < '2016-04-05 12:00'  -- exclude later rows
   GROUP  BY 1, 2
   ORDER  BY 1, 2$$
, 'SELECT generate_series(0,3)'
   ) AS t(status text, "2016-04-05 01:00" int
                     , "2016-04-05 04:00" int
                     , "2016-04-05 08:00" int
                     , "2016-04-05 12:00" int);
status | 2016-04-05 01:00 | 2016-04-05 04:00 | 2016-04-05 08:00 | 2016-04-05 12:00
--------+------------------+------------------+------------------+------------------
 info   |                1 |                1 |                  |  1
 warn   |                  |                  |                1 |  1
SELECT status
     , COALESCE(t0, 0) AS "2016-04-05 01:00"
     , COALESCE(t1, 0) AS "2016-04-05 04:00"
     , COALESCE(t2, 0) AS "2016-04-05 08:00"
     , COALESCE(t3, 0) AS "2016-04-05 12:00"
FROM   crosstab(
 $$SELECT status
        , width_bucket(created_at, '{2016-04-05 01:00
                                   , 2016-04-05 04:00
                                   , 2016-04-05 08:00
                                   , 2016-04-05 12:00}'::timestamp[])
        , count(*)::int
   FROM   tbl
   WHERE  created_at < '2016-04-05 12:00'
   GROUP  BY 1, 2
   ORDER  BY 1, 2$$
, 'SELECT generate_series(0,3)'
   ) AS t(status text, t0 int, t1 int, t2 int, t3 int);

Context

StackExchange Database Administrators Q#134874, answer score: 8

Revisions (0)

No revisions yet.