HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Pivot with 2+ columns (using CROSSTAB?)

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
columnswithpivotcrosstabusing

Problem

I have a table deflator that is defined as:

Table "deflator"
    Column   |       Type        | Modifiers
-------------+-------------------+-----------
country_code | smallint          | not null
country_name | character varying | not null
year         | smallint          | not null
deflator     | numeric           |
source       | character varying |


Sample output from this table looks like:

country_code | country_name  | year | deflator | source
-------------+---------------+------+----------+----------
           1 | country_1     | 2016 |       12 | source_1
           1 | country_1     | 2015 |       11 | source_2
           1 | country_1     | 2014 |       10 | source_2
           2 | country_2     | 2016 |       15 | source_1
           2 | country_2     | 2015 |       14 | source_1
           2 | country_2     | 2014 |       13 | source_2
           3 | country_3     | 2016 |       18 | source_1
           3 | country_3     | 2015 |       17 | source_2
           3 | country_3     | 2014 |       16 | source_3
(9 rows)


I use the following query to pivot the table if I exclude the column source:

SELECT
    *
FROM CROSSTAB (
    'SELECT
        country_code
        , country_name
        , year
        , deflator
     FROM dimension.master_oecd_deflator
     ORDER BY 1;'
     , $ VALUES ('2014'::TEXT), ('2015'::TEXT), ('2016'::TEXT) $
) AS "ct" (
    "country_code" SMALLINT
    , "country_name" TEXT
    , "2014" NUMERIC
    , "2015" NUMERIC
    , "2016" NUMERIC
);


The above query gives me:

country_code |   country_name    | 2016 | 2015 | 2014 |
-------------+-------------------+------+--- --+------+
           1 | country_1         | 12   | 11   | 10   |
           2 | country_2         | 15   | 14   | 13   |
           3 | country_3         | 18   | 17   | 16   |


But because the source of the deflator varies from year to year for each country I want to include the source column in the pivot for my desired output

Solution

Saddam has a smart solution, but it carries some weaknesses. Imagine a source named 'Fresno, CA' (with comma in the string). split_part() would be fooled by the separator character in the string ...

To avoid such corner case problems and preserve original data types, use a (well-defined!) row type instead. You can create a composite type permanently with CREATE TYPE or register a temporary one with CREATE TEMP TABLE:

CREATE TEMP TABLE defso (def numeric, so varchar);  -- once per session!

SELECT country_code
     , country_name
     , (d14).def AS deflator_2014  -- note the parentheses!
     , (d14).so  AS source_2014
     , (d15).def AS deflator_2015
     , (d15).so  AS source_2015
     , (d16).def AS deflator_2016
     , (d16).so  AS source_2016
FROM   crosstab (
    'SELECT country_code, country_name, year, (deflator, source)::defso
     FROM   deflator
     ORDER  BY 1'
  , 'SELECT generate_series(2014, 2016)::int2'
   ) AS ct (country_code int2
          , country_name text
          , d14 defso
          , d15 defso
          , d16 defso
   );


I also removed the unnecessary CTE and simplified a bit.

While dealing with only a hand full of years, you can do without crosstab() and use self-joins:

SELECT country_code, country_name
     , d14.deflator AS deflator_2014
     , d14.source   AS source_2014
     , d15.deflator AS deflator_2015
     , d15.source   AS source_2015
     , d16.deflator AS deflator_2016
     , d16.source   AS source_2016
FROM        (SELECT * FROM deflator WHERE year = int2 '2014') d14
FULL   JOIN (SELECT * FROM deflator WHERE year = int2 '2015') d15 USING (country_code, country_name)
FULL   JOIN (SELECT * FROM deflator WHERE year = int2 '2016') d16 USING (country_code, country_name)
ORDER  BY country_code;


Using FULL [OUTER] JOIN since we can't assume a row for every combination of (country_code, year). This way we get the same result as with the crosstab query above.

Including country_name in the join condition seems redundant, but if we don't, we have to use COALESCE(d14.country_name, d15.country_name, d16.country_name) AS country_name to defend against missing rows. This functionally dependent value shouldn't be in the table to begin with. Should be in a country table in a properly normalized schema.

Code Snippets

CREATE TEMP TABLE defso (def numeric, so varchar);  -- once per session!

SELECT country_code
     , country_name
     , (d14).def AS deflator_2014  -- note the parentheses!
     , (d14).so  AS source_2014
     , (d15).def AS deflator_2015
     , (d15).so  AS source_2015
     , (d16).def AS deflator_2016
     , (d16).so  AS source_2016
FROM   crosstab (
    'SELECT country_code, country_name, year, (deflator, source)::defso
     FROM   deflator
     ORDER  BY 1'
  , 'SELECT generate_series(2014, 2016)::int2'
   ) AS ct (country_code int2
          , country_name text
          , d14 defso
          , d15 defso
          , d16 defso
   );
SELECT country_code, country_name
     , d14.deflator AS deflator_2014
     , d14.source   AS source_2014
     , d15.deflator AS deflator_2015
     , d15.source   AS source_2015
     , d16.deflator AS deflator_2016
     , d16.source   AS source_2016
FROM        (SELECT * FROM deflator WHERE year = int2 '2014') d14
FULL   JOIN (SELECT * FROM deflator WHERE year = int2 '2015') d15 USING (country_code, country_name)
FULL   JOIN (SELECT * FROM deflator WHERE year = int2 '2016') d16 USING (country_code, country_name)
ORDER  BY country_code;

Context

StackExchange Database Administrators Q#158181, answer score: 5

Revisions (0)

No revisions yet.