HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How do I use window functions ("over") together with crosstab in Postgres?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgreswithtogetherwindowcrosstabhowfunctionsuseover

Problem

Table: application_app

applied_date          | applied_class | id
-------------------------------+---------------+-----
 2013-03-06 15:59:49.22774-05  | BOSFAL13      |  13
 2013-05-12 16:56:34.749859-04 | BOSFAL13      | 232
 2013-05-26 15:48:11.969723-04 | BOSFAL13      | 359
 2013-06-02 23:02:17.267199-04 | BOSFAL13      | 400
 2013-06-03 12:35:46.90801-04  | BOSFAL13      | 408
 2013-06-08 19:17:42.179707-04 | NYCFAL13      | 449
 2013-06-11 02:04:38.232501-04 | NYCFAL13      | 450
 2013-06-11 13:05:42.509843-04 | NYCFAL13      | 452
 2013-06-12 09:19:21.459897-04 | BOSFAL13      | 454
 2013-06-13 02:38:52.782411-04 | NYCFAL13      | 456
 2013-06-17 19:41:31.813795-04 | NYCFAL13      | 457
 2013-06-18 17:39:04.892315-04 | BOSFAL13      | 459
 2013-06-19 09:05:29.892635-04 | BOSFAL13      | 460
 2013-06-21 09:06:06.380306-04 | NYCFAL13      | 462
 2013-06-22 05:02:31.699312-04 | BOSFAL13      | 463


I'm trying to use crosstab to get an aggregate of applied_class over time (by day).

Current Query

select *

from crosstab(
    'select temp_table.d,  
       applied_class,  
       sum(temp_table.ct) over (order by d) 

    from   
       (
        select count(id) ct, 
               applied_class, 
               to_char(date_trunc($day$, applied_date), $MM-DD$) d from application_app 
        where applied_class like $%L13$ 
        group by applied_class, d 
        order by d
        ) as temp_table
    order by 1, 2'

    ,'VALUES ($BOSFAL13$::text), ($NYCFAL13$::text), ($CHIFAL13$::text), ($MULTIPLECITIESFALL13$::text)')  -- end crosstab

as ct ("Day" text, "Boston" int, "New York" int, "Chicago" int, "Any City" int);


This query is very close, but it's increasing the count for each applied_class whenever there is a record for any applied_class:

Results

```
Day | Boston | New York | Chicago | Any Class
-------+--------+----------+---------+-----------
03-06 | 1 | | |
05-12 | 2 | |

Solution

In addition to what Colin already cleared up:

SELECT * FROM crosstab(
$$SELECT (applied_date AT TIME ZONE 'America/New_York')::date AS d
,applied_class
,sum(count(id)) OVER (PARTITION BY applied_class
ORDER BY (applied_date AT TIME ZONE 'America/New_York')::date)
FROM application_app
WHERE applied_class LIKE '%L13'
GROUP BY 1, 2
ORDER BY 1, 2$$

,$$VALUES ('BOSFAL13'::text), ('NYCFAL13'), ('CHIFAL13'), ('MULTIPLECITIESFALL13')$$)

AS ct ("Day" date, "Boston" int, "New York" int, "Chicago" int, "Any City" int);


Major points

-
The extracted day depends on the time zone setting of the current session. Your sample displays timestamp with time zone literals, which indicates you may be operating in multiple distinct time zones. To get the date for a particular time zone, you have to declare that, or the current setting will be applied - which is easily overlooked and therefore a loaded footgun: You would get different numbers, depending on where you run the query!

Details:

  • Ignoring timezones altogether in Rails and PostgreSQL



  • Accounting for DST in Postgres, when selecting scheduled items



Safe syntax would be:

to_char(applied_date AT TIME ZONE 'America/New_York', 'MM-DD') -- your time zone


-
There is another trap. If you GROUP BY day and month only, you can easily aggregate dates from multiple years by accident. I use the full date in my example.

Plus, skipping the key word AS is fine for table aliases but not for column aliases. So:

(applied_date AT TIME ZONE 'America/New_York')::date AS d

See:

  • SELECT Query merge / join two tables in PostgreSQL



-
You don't need date_trunc() here. to_char() can work with the original timestamp directly.

to_char(applied_date, 'MM-DD')


-
You do not need a subquery. Window functions can be computed OVER aggregate functions. sum(count(id)) OVER ... in my query. More explanation in this related answer on SO:

  • Get the distinct sum of a joined table column



You'd have to test which is faster. I don't expect much of a difference.

-
It's more efficient to invert your use of dollar-quotes and plain quotes.

-
It's sufficient to declare the type in the first row of a VALUES expression.

Context

StackExchange Database Administrators Q#51103, answer score: 4

Revisions (0)

No revisions yet.