patternsqlMinor
why index scan is slower then seq scan?
Viewed 0 times
whyscanseqslowerthenindex
Problem
slava=# explain (analyze) SELECT * from pending_posts WHERE user_id <> 1456
slava-# ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on pending_posts (cost=0.00..17906.00 rows=999994 width=14) (actual time=0.021..231.803 rows=999994 loops=1)
Filter: (user_id <> 1456)
Rows Removed by Filter: 6
Planning time: 0.108 ms
Execution time: 289.845 ms
(5 rows)
slava=# explain (analyze) select * from pending_posts where user_id <> 1456 order_by user_id;
ERROR: syntax error at or near "order_by"
LINE 1: ...select * from pending_posts where user_id <> 1456 order_by u...
^
slava=# explain (analyze) select * from pending_posts where user_id <> 1456 order by user_id;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Index Scan using p_i on pending_posts (cost=0.42..50103.17 rows=999994 width=14) (actual time=0.033..899.520 rows=999994 loops=1)
Filter: (user_id <> 1456)
Rows Removed by Filter: 6
Planning time: 0.141 ms
Execution time: 953.982 ms
(5 rows)Given that
user_id is integer indexed with btree, should't it be faster to sort and then select using tree search?Solution
PostgreSQL will evaluate all (or a high number) of possible query plans, and make a good guess at which one will be the least costly. If the different cost parameters (that depend, for instance, on the speed of your disks; the amount of available memory, etc.) are well set, most of the times that guess will be accurate and the choice of plans will be optimum.
You're asking PostgreSQL two different queries. One doesn't need sorting, the other one does. The rows returned are the same, because the rest of the query is the same. The query with the
PostgreSQL could have decided that performing a
Performing an
With standard settings, the costs for for
If you would be using a covering index (one index that contains at least all the columns that are in your
Try this query and see what the execution plan is:
If you don't get an
and retry it.
Note that indexes can help you in several areas:
References:
You're asking PostgreSQL two different queries. One doesn't need sorting, the other one does. The rows returned are the same, because the rest of the query is the same. The query with the
ORDER BY takes longer. That is to be expected.PostgreSQL could have decided that performing a
Seq Scan then a Sort would be quicker than performing an Index Scan. With the actual settings, it decided the opposite.Performing an
Index Scan takes longer than performing a Seq Scan just because, when scanning the index, PostgreSQL can just skip the entries with (user_id <> 1456) (the info for user_id is in the index). For the non-skipped values, it gets the (internal) reference to the corresponding row (which is also part of the index), and it has to retrieve the actual row in the table. It won't need to sort them, because by following the index, the values are already in the proper order.With standard settings, the costs for for
seq_page_cost and random_page_cost (in arbitrary units) are 4 and 1. That takes into account that, with HDD, jumping from one page to another needs a significant mechanical movement, and needs more time than reading pages one after next. You can assume a Seq Scan will read pages sequentially, and an Index Scan will mostly jump from one to another. [If you use SSD instead of HDD, the value for random_page_cost should probably be put to just 1.05].If you would be using a covering index (one index that contains at least all the columns that are in your
SELECT), then, PostgreSQL would be able to perform an Index Only Scan (meaning it doesn't have to retrieve the whole rows from the actual *table). That would speed up your query significantly. For a covering index to work, your table has to be properly maintained (i.e.: autovacuum must have had time to maintain it).Try this query and see what the execution plan is:
EXPLAIN (analyze)
SELECT user_id
FROM pending_posts
WHERE user_id <> 1456
ORDER BY user_id ;If you don't get an
Index Only Scan, perform a:VACUUM ANALYZE pending_posts ;and retry it.
Note that indexes can help you in several areas:
- In the
SELECTclause if they are covering indexes. If the indexes contain all the columns specified,Index Only Scansare possible.
- In the
WHEREclause if they allow to filter out (or go directly to) specific) rows.
- In the
ORDER BYclause, if they provide already the rows in the sort order you specify.
- In the
GROUP BY, if the aggregation is done by sorting the rows according to the groups (other techniques are possible).
References:
- Manoji Nesh Blog: Seq scan vs Index Scan
- PostgreSQL query plan and SQL performance notes - Types of index scans
- Stack Overflow's Why does PostgreSQL perform sequential scan on indexed column?
Code Snippets
EXPLAIN (analyze)
SELECT user_id
FROM pending_posts
WHERE user_id <> 1456
ORDER BY user_id ;VACUUM ANALYZE pending_posts ;Context
StackExchange Database Administrators Q#180271, answer score: 6
Revisions (0)
No revisions yet.