snippetsqlMinor
How do I use window functions ("over") together with crosstab in Postgres?
Viewed 0 times
postgreswithtogetherwindowcrosstabhowfunctionsuseover
Problem
Table: application_app
I'm trying to use
Current Query
This query is very close, but it's increasing the count for each
Results
```
Day | Boston | New York | Chicago | Any Class
-------+--------+----------+---------+-----------
03-06 | 1 | | |
05-12 | 2 | |
applied_date | applied_class | id
-------------------------------+---------------+-----
2013-03-06 15:59:49.22774-05 | BOSFAL13 | 13
2013-05-12 16:56:34.749859-04 | BOSFAL13 | 232
2013-05-26 15:48:11.969723-04 | BOSFAL13 | 359
2013-06-02 23:02:17.267199-04 | BOSFAL13 | 400
2013-06-03 12:35:46.90801-04 | BOSFAL13 | 408
2013-06-08 19:17:42.179707-04 | NYCFAL13 | 449
2013-06-11 02:04:38.232501-04 | NYCFAL13 | 450
2013-06-11 13:05:42.509843-04 | NYCFAL13 | 452
2013-06-12 09:19:21.459897-04 | BOSFAL13 | 454
2013-06-13 02:38:52.782411-04 | NYCFAL13 | 456
2013-06-17 19:41:31.813795-04 | NYCFAL13 | 457
2013-06-18 17:39:04.892315-04 | BOSFAL13 | 459
2013-06-19 09:05:29.892635-04 | BOSFAL13 | 460
2013-06-21 09:06:06.380306-04 | NYCFAL13 | 462
2013-06-22 05:02:31.699312-04 | BOSFAL13 | 463I'm trying to use
crosstab to get an aggregate of applied_class over time (by day).Current Query
select *
from crosstab(
'select temp_table.d,
applied_class,
sum(temp_table.ct) over (order by d)
from
(
select count(id) ct,
applied_class,
to_char(date_trunc($day$, applied_date), $MM-DD$) d from application_app
where applied_class like $%L13$
group by applied_class, d
order by d
) as temp_table
order by 1, 2'
,'VALUES ($BOSFAL13$::text), ($NYCFAL13$::text), ($CHIFAL13$::text), ($MULTIPLECITIESFALL13$::text)') -- end crosstab
as ct ("Day" text, "Boston" int, "New York" int, "Chicago" int, "Any City" int);This query is very close, but it's increasing the count for each
applied_class whenever there is a record for any applied_class:Results
```
Day | Boston | New York | Chicago | Any Class
-------+--------+----------+---------+-----------
03-06 | 1 | | |
05-12 | 2 | |
Solution
In addition to what Colin already cleared up:
Major points
-
The extracted day depends on the time zone setting of the current session. Your sample displays
Details:
Safe syntax would be:
-
There is another trap. If you
Plus, skipping the key word
See:
-
You don't need
-
You do not need a subquery. Window functions can be computed
You'd have to test which is faster. I don't expect much of a difference.
-
It's more efficient to invert your use of dollar-quotes and plain quotes.
-
It's sufficient to declare the type in the first row of a
SELECT * FROM crosstab(
$$SELECT (applied_date AT TIME ZONE 'America/New_York')::date AS d
,applied_class
,sum(count(id)) OVER (PARTITION BY applied_class
ORDER BY (applied_date AT TIME ZONE 'America/New_York')::date)
FROM application_app
WHERE applied_class LIKE '%L13'
GROUP BY 1, 2
ORDER BY 1, 2$$
,$$VALUES ('BOSFAL13'::text), ('NYCFAL13'), ('CHIFAL13'), ('MULTIPLECITIESFALL13')$$)
AS ct ("Day" date, "Boston" int, "New York" int, "Chicago" int, "Any City" int);
Major points
-
The extracted day depends on the time zone setting of the current session. Your sample displays
timestamp with time zone literals, which indicates you may be operating in multiple distinct time zones. To get the date for a particular time zone, you have to declare that, or the current setting will be applied - which is easily overlooked and therefore a loaded footgun: You would get different numbers, depending on where you run the query!Details:
- Ignoring timezones altogether in Rails and PostgreSQL
- Accounting for DST in Postgres, when selecting scheduled items
Safe syntax would be:
to_char(applied_date AT TIME ZONE 'America/New_York', 'MM-DD') -- your time zone
-
There is another trap. If you
GROUP BY day and month only, you can easily aggregate dates from multiple years by accident. I use the full date in my example.Plus, skipping the key word
AS is fine for table aliases but not for column aliases. So:(applied_date AT TIME ZONE 'America/New_York')::date AS dSee:
- SELECT Query merge / join two tables in PostgreSQL
-
You don't need
date_trunc() here. to_char() can work with the original timestamp directly.to_char(applied_date, 'MM-DD')
-
You do not need a subquery. Window functions can be computed
OVER aggregate functions. sum(count(id)) OVER ... in my query. More explanation in this related answer on SO:- Get the distinct sum of a joined table column
You'd have to test which is faster. I don't expect much of a difference.
-
It's more efficient to invert your use of dollar-quotes and plain quotes.
-
It's sufficient to declare the type in the first row of a
VALUES expression.Context
StackExchange Database Administrators Q#51103, answer score: 4
Revisions (0)
No revisions yet.