patternsqlMinor
Index that is not used, yet influences query
Viewed 0 times
influencesusedqueryyetthatindexnot
Problem
I've got a PostgreSQL 9.3 table with some numbers and some additional data:
This table currently has about 10M records and takes 1GB of disk space.
I want to compute how many rows are in every block of 100000 consecutive numbers:
This returns about 3500 rows.
I noticed that existence of a certain index significantly speeds up this query even though the query plan does not mention it at all. The query plan without the index:
The index:
The new query plan:
```
db=> EXPLAIN (ANALYZE TRUE, VERBOSE TRUE) SELECT myid/100000 AS block, count(*) AS total FROM mytable GROUP BY myid/100000;
QUERY PLAN
CREATE TABLE mytable (
myid BIGINT,
somedata BYTEA
)This table currently has about 10M records and takes 1GB of disk space.
myid are not consecutive.I want to compute how many rows are in every block of 100000 consecutive numbers:
SELECT myid/100000 AS block, count(*) AS total FROM mytable GROUP BY myid/100000;This returns about 3500 rows.
I noticed that existence of a certain index significantly speeds up this query even though the query plan does not mention it at all. The query plan without the index:
db=> EXPLAIN (ANALYZE TRUE, VERBOSE TRUE) SELECT myid/100000 AS block, count(*) AS total FROM mytable GROUP BY myid/100000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=1636639.92..1709958.65 rows=496942 width=8) (actual time=6783.763..8888.841 rows=3460 loops=1)
Output: ((myid / 100000)), count(*)
-> Sort (cost=1636639.92..1659008.91 rows=8947594 width=8) (actual time=6783.752..8005.831 rows=8947557 loops=1)
Output: ((myid / 100000))
Sort Key: ((mytable.myid / 100000))
Sort Method: external merge Disk: 157440kB
-> Seq Scan on public.mytable (cost=0.00..236506.92 rows=8947594 width=8) (actual time=0.020..1674.838 rows=8947557 loops=1)
Output: (myid / 100000)
Total runtime: 8914.780 ms
(9 rows)The index:
db=> CREATE INDEX myindex ON mytable ((myid/100000));
db=> VACUUM ANALYZE;The new query plan:
```
db=> EXPLAIN (ANALYZE TRUE, VERBOSE TRUE) SELECT myid/100000 AS block, count(*) AS total FROM mytable GROUP BY myid/100000;
QUERY PLAN
Solution
When you create an expression index, it causes PostgreSQL to gather statistics on the that expression. With those statistics on hand, it now has an accurate estimate for the number of aggregated rows that the query will return, which leads it to make a better plan choice.
Specifically in this case, without those extra statistics it thought the hash table would be too large to fit in work_mem, so it didn't choose that method.
Specifically in this case, without those extra statistics it thought the hash table would be too large to fit in work_mem, so it didn't choose that method.
Context
StackExchange Database Administrators Q#72649, answer score: 6
Revisions (0)
No revisions yet.