patternMinor
Poor query performance over timestamp range by week in PostgreSQL 9.3
Viewed 0 times
postgresqlrangequerypoorweekperformancetimestampover
Problem
I have a slow query that generates a report of account activity per week over the past year. The table currently has nearly 5 million rows and this query currently takes 8 seconds to execute. The (current) bottleneck is the sequential scan over the timestamp range.
The table:
```
account=> \d account_history
Table "public.account_history"
Column | Type | Modifiers
-------------+-----------------------------+---------------------------
account | integer | not null
event_code | text | not null
event_time | timestamp without time zone | not null default now()
description | text | not null default ''::text
Indexes:
"account_history_idx" btree (account, event_time DESC)
"account_id_idx" btree (account, event_code, event_time)
Foreign-key const
account=> EXPLAIN ANALYZE SELECT to_timestamp(to_char(date_trunc('week', event_time), 'IYYY-IW'), 'IYYY-IW')::date AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;
GroupAggregate (cost=450475.76..513465.44 rows=2290534 width=12) (actual time=7524.474..8003.291 rows=52 loops=1)
Group Key: ((to_timestamp(to_char(date_trunc('week'::text, event_time), 'IYYY-IW'::text), 'IYYY-IW'::text))::date)
-> Sort (cost=450475.76..456202.09 rows=2290534 width=12) (actual time=7519.053..7691.924 rows=2314164 loops=1)
Sort Key: ((to_timestamp(to_char(date_trunc('week'::text, event_time), 'IYYY-IW'::text), 'IYYY-IW'::text))::date)
Sort Method: external sort Disk: 40704kB
-> Seq Scan on account_history (cost=0.00..169364.81 rows=2290534 width=12) (actual time=1470.438..6222.076 rows=2314164 loops=1)
Filter: ((event_time = (now() - '357 days'::interval)))
Rows Removed by Filter: 2591679
Planning time: 0.126 ms
Execution time: 8011.160 msThe table:
```
account=> \d account_history
Table "public.account_history"
Column | Type | Modifiers
-------------+-----------------------------+---------------------------
account | integer | not null
event_code | text | not null
event_time | timestamp without time zone | not null default now()
description | text | not null default ''::text
Indexes:
"account_history_idx" btree (account, event_time DESC)
"account_id_idx" btree (account, event_code, event_time)
Foreign-key const
Solution
Thanks to those who contributed in the comments, I have reduced the query time from ~8000 ms to ~1650 ms by:
For reference, the current table structure and execution plan are below.
I did play around with the other variations of indexing over multiple columns, but none of those indices were used by the execution plan.
In addition, I took the advice of another comment and took the following steps (followed by VACUUM and REINDEX):
These additional changes shaved another 400ms from the execution time and also cut down the planning time. One thing to note is that the sort method has changed from "external sort" to "external merge". Since the 'Disk' was still being used for the sort, I increased the work_mem to 200MB, which resulted in the quicksort (memory) method being used (176MB). This dropped a full second off the execution time (although this is really too high to be used on our server instances).
The updated table and execution plan are below.
I am very happy with the improvements thus far, but I welcome any other contributions to improving the performance of this query, since this is still the slowest query I have in one of my views.
- Adding an index on the timestamp column only (~2000 ms improvement).
- Removing the extra timestamp-to-char-to-timestamp conversion (or adding
date_trunc('week', event_time)to theGROUP BYclause) (~3000 ms improvement).
For reference, the current table structure and execution plan are below.
I did play around with the other variations of indexing over multiple columns, but none of those indices were used by the execution plan.
In addition, I took the advice of another comment and took the following steps (followed by VACUUM and REINDEX):
- Dropped the constraints from the description column and set all empty strings to NULL
- Converted the timestamp column from
WITHOUT TIME ZONEtoWITH TIME ZONE
- Increased the work_mem to 100MB (via
postgresql.conf).
ALTER TABLE account_history ALTER event_time TYPE timestamptz USING event_time AT TIME ZONE 'UTC';
ALTER TABLE account_history ALTER COLUMN description DROP NOT NULL;
ALTER TABLE account_history ALTER COLUMN description DROP DEFAULT;
UPDATE account_history SET description=NULL WHERE description='';
VACUUM FULL;
REINDEX TABLE account_history;
account=> show work_mem;
work_mem
----------
100MBThese additional changes shaved another 400ms from the execution time and also cut down the planning time. One thing to note is that the sort method has changed from "external sort" to "external merge". Since the 'Disk' was still being used for the sort, I increased the work_mem to 200MB, which resulted in the quicksort (memory) method being used (176MB). This dropped a full second off the execution time (although this is really too high to be used on our server instances).
The updated table and execution plan are below.
account=> \d account_history
Table "public.account_history"
Column | Type | Modifiers
-------------+--------------------------+------------------------
account | integer | not null
event_code | text | not null
event_time | timestamp with time zone | not null default now()
description | text |
Indexes:
"account_history_account_idx" btree (account)
"account_history_account_time_idx" btree (event_time DESC, account)
"account_history_time_idx" btree (event_time DESC)
Foreign-key constraints:
"account_fk" FOREIGN KEY (account) REFERENCES account(id) ON UPDATE CASCADE ON DELETE RESTRICT
"event_code_fk" FOREIGN KEY (event_code) REFERENCES domain_account_event(code) ON UPDATE CASCADE ON DELETE RESTRICTaccount=> EXPLAIN ANALYZE VERBOSE SELECT date_trunc('week', event_time) AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;
GroupAggregate (cost=334034.60..380541.52 rows=2325346 width=12) (actual time=1307.742..1685.676 rows=52 loops=1)
Output: (date_trunc('week'::text, event_time)), count(DISTINCT account)
Group Key: (date_trunc('week'::text, account_history.event_time))
-> Sort (cost=334034.60..339847.97 rows=2325346 width=12) (actual time=1303.565..1361.540 rows=2312418 loops=1)
Output: (date_trunc('week'::text, event_time)), account
Sort Key: (date_trunc('week'::text, account_history.event_time))
Sort Method: quicksort Memory: 176662kB
-> Index Only Scan using account_history_account_time_idx on public.account_history (cost=0.44..88140.73 rows=2325346 width=12) (actual time=0.028..980.822 rows=2312418 loops=1)
Output: date_trunc('week'::text, event_time), account
Index Cond: ((account_history.event_time >= (now() - '357 days'::interval)) AND (account_history.event_time <= now()))
Heap Fetches: 0
Planning time: 0.153 ms
Execution time: 1697.824 msI am very happy with the improvements thus far, but I welcome any other contributions to improving the performance of this query, since this is still the slowest query I have in one of my views.
Code Snippets
ALTER TABLE account_history ALTER event_time TYPE timestamptz USING event_time AT TIME ZONE 'UTC';
ALTER TABLE account_history ALTER COLUMN description DROP NOT NULL;
ALTER TABLE account_history ALTER COLUMN description DROP DEFAULT;
UPDATE account_history SET description=NULL WHERE description='';
VACUUM FULL;
REINDEX TABLE account_history;
account=> show work_mem;
work_mem
----------
100MBaccount=> \d account_history
Table "public.account_history"
Column | Type | Modifiers
-------------+--------------------------+------------------------
account | integer | not null
event_code | text | not null
event_time | timestamp with time zone | not null default now()
description | text |
Indexes:
"account_history_account_idx" btree (account)
"account_history_account_time_idx" btree (event_time DESC, account)
"account_history_time_idx" btree (event_time DESC)
Foreign-key constraints:
"account_fk" FOREIGN KEY (account) REFERENCES account(id) ON UPDATE CASCADE ON DELETE RESTRICT
"event_code_fk" FOREIGN KEY (event_code) REFERENCES domain_account_event(code) ON UPDATE CASCADE ON DELETE RESTRICTaccount=> EXPLAIN ANALYZE VERBOSE SELECT date_trunc('week', event_time) AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;
GroupAggregate (cost=334034.60..380541.52 rows=2325346 width=12) (actual time=1307.742..1685.676 rows=52 loops=1)
Output: (date_trunc('week'::text, event_time)), count(DISTINCT account)
Group Key: (date_trunc('week'::text, account_history.event_time))
-> Sort (cost=334034.60..339847.97 rows=2325346 width=12) (actual time=1303.565..1361.540 rows=2312418 loops=1)
Output: (date_trunc('week'::text, event_time)), account
Sort Key: (date_trunc('week'::text, account_history.event_time))
Sort Method: quicksort Memory: 176662kB
-> Index Only Scan using account_history_account_time_idx on public.account_history (cost=0.44..88140.73 rows=2325346 width=12) (actual time=0.028..980.822 rows=2312418 loops=1)
Output: date_trunc('week'::text, event_time), account
Index Cond: ((account_history.event_time >= (now() - '357 days'::interval)) AND (account_history.event_time <= now()))
Heap Fetches: 0
Planning time: 0.153 ms
Execution time: 1697.824 msContext
StackExchange Database Administrators Q#189190, answer score: 2
Revisions (0)
No revisions yet.