patternsqlMajor
Slow index scans in large table
Viewed 0 times
slowscanslargeindextable
Problem
Update 2020-08-04:
Since this answer is apparently still being viewed regularly I wanted to provide an update on the situation. We're currently using PG 11 with table partitioning on
Using PostgreSQL 9.2, I have troubles with slow queries on a relatively large table (200+ million rows). I'm not trying anything crazy, just adding historic values. Below is the query and the query plan output.
My table layout:
The data ranges from 2012-01-01 till now, with new data constantly being added. There are about 2.2k distinct values in the
I notice that the row estimates aren't far off, but the cost estimates seem larger by factor 4x. This probably isn't an issue, but is there anything I could do about it?
I expect that disk access might be the issue, since the table isn't in memory all the time.
```
Aggregate (cost
Since this answer is apparently still being viewed regularly I wanted to provide an update on the situation. We're currently using PG 11 with table partitioning on
timestamp and are easily handling a few billion rows in the table(s). The index-only scans are a life saver and it wouldn't be possible without.Using PostgreSQL 9.2, I have troubles with slow queries on a relatively large table (200+ million rows). I'm not trying anything crazy, just adding historic values. Below is the query and the query plan output.
My table layout:
Table "public.energy_energyentry"
Column | Type | Modifiers
-----------+--------------------------+-----------------------------------------------------------------
id | integer | not null default nextval('energy_energyentry_id_seq'::regclass)
prop_id | integer | not null
timestamp | timestamp with time zone | not null
value | double precision | not null
Indexes:
"energy_energyentry_pkey" PRIMARY KEY, btree (id)
"energy_energyentry_prop_id" btree (prop_id)
"energy_energyentry_prop_id_timestamp_idx" btree (prop_id, "timestamp")
Foreign-key constraints:
"energy_energyentry_prop_id_fkey" FOREIGN KEY (prop_id) REFERENCES gateway_peripheralproperty(id) DEFERRABLE INITIALLY DEFERREDThe data ranges from 2012-01-01 till now, with new data constantly being added. There are about 2.2k distinct values in the
prop_id foreign key, distributed evenly.I notice that the row estimates aren't far off, but the cost estimates seem larger by factor 4x. This probably isn't an issue, but is there anything I could do about it?
I expect that disk access might be the issue, since the table isn't in memory all the time.
EXPLAIN ANALYZE
SELECT SUM("value")
FROM "energy_energyentry"
WHERE
"prop_id"=82411
AND "timestamp">'2014-06-11'
AND "timestamp"<'2014-11-11'
;```
Aggregate (cost
Solution
Your table is big, and so is any index spanning the whole table. Assuming that:
I would suggest a partial, multi-column (covering!) index:
In Postgres 11 or later use a "covering" index instead:
See:
Only include the time range that is queried regularly. Only makes sense if you cut off most of the big table this way. Effectiveness deteriorates over time with new entries. Recreate the index from time to time. (You may need to adapt your queries.) See linked answer below.
The last column value is only included to get index-only scans out of this. Aggressive autovacuum setting may help by keeping the visibility map up to date, like @jjanes already mentioned.
The partial index should fit into RAM more easily and stay there longer.
You may need to include this
Since your query is summing a lot of rows (
Related (but ignore
Aside:
Since you currently have an index on
- only new data (with
timestamp = now()) is entered
- existing rows are neither changed nor deleted.
- you have data since 2012-01-01 but queries are predominantly on the current year (?)
I would suggest a partial, multi-column (covering!) index:
CREATE INDEX ON energy_energyentry (prop_id, "timestamp", value)
WHERE "timestamp" >= '2014-01-01 0:0'; -- adapt to your needsIn Postgres 11 or later use a "covering" index instead:
CREATE INDEX ON energy_energyentry (prop_id, "timestamp") INCLUDE (value)
WHERE "timestamp" >= '2014-01-01 0:0';See:
- Do covering indexes in PostgreSQL help JOIN columns?
Only include the time range that is queried regularly. Only makes sense if you cut off most of the big table this way. Effectiveness deteriorates over time with new entries. Recreate the index from time to time. (You may need to adapt your queries.) See linked answer below.
The last column value is only included to get index-only scans out of this. Aggressive autovacuum setting may help by keeping the visibility map up to date, like @jjanes already mentioned.
The partial index should fit into RAM more easily and stay there longer.
You may need to include this
WHERE condition in queries to make the planner understand the index is applicable to the query, like:SELECT sum(value) AS sum_value
FROM energy_energyentry
WHERE prop_id = 82411
AND "timestamp" > '2014-06-11 0:0'
AND "timestamp" = '2014-01-01 0:0'; -- seems redundant, but may be neededSince your query is summing a lot of rows (
rows=13578), this is going to take some time, even with an index-only scan. It shouldn't be anywhere near 50 seconds, though. Less than a second on any halfway decent hardware.Related (but ignore
CLUSTER and FILLFACTOR, both are irrelevant if you can get index-only scans out of this):- Index optimization with dates
Aside:
Since you currently have an index on
(prop_id, "timestamp"), the additional index on just (prop_id) may cost more than it's worth:- Is a composite index also good for queries on the first field?
Code Snippets
CREATE INDEX ON energy_energyentry (prop_id, "timestamp", value)
WHERE "timestamp" >= '2014-01-01 0:0'; -- adapt to your needsCREATE INDEX ON energy_energyentry (prop_id, "timestamp") INCLUDE (value)
WHERE "timestamp" >= '2014-01-01 0:0';SELECT sum(value) AS sum_value
FROM energy_energyentry
WHERE prop_id = 82411
AND "timestamp" > '2014-06-11 0:0'
AND "timestamp" < '2014-11-11 0:0'
AND "timestamp" >= '2014-01-01 0:0'; -- seems redundant, but may be neededContext
StackExchange Database Administrators Q#81456, answer score: 20
Revisions (0)
No revisions yet.