patternsqlModerate
Selecting top 10 from indexed field of a big table takes too long
Viewed 0 times
fieldtoplongtooindexedbigselectingfromtakestable
Problem
I have a table with 165M records like this:
I also have an index on hour:
However, select top 10 records ordered by hour takes 6 minutes!
Returns
Why is it taking so long? If there is an index on date field desc - shouldn't it be super quick to retrieve data?
Performance
id integer
installs integer
hour timestamp without time zoneI also have an index on hour:
CREATE INDEX hour_idx
ON performance
USING btree
(hour DESC NULLS LAST);However, select top 10 records ordered by hour takes 6 minutes!
EXPLAIN ANALYZE select hour from performance order by hour desc limit 10Returns
Limit (cost=7952135.23..7952135.25 rows=10 width=8) (actual time=376313.958..376313.964 rows=10 loops=1)
-> Sort (cost=7952135.23..8368461.00 rows=166530310 width=8) (actual time=376313.957..376313.960 rows=10 loops=1)
Sort Key: hour
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on performance (cost=0.00..4353475.10 rows=166530310 width=8) (actual time=0.006..327149.828 rows=192330557 loops=1)
Planning time: 0.070 ms
Execution time: 376330.573 msWhy is it taking so long? If there is an index on date field desc - shouldn't it be super quick to retrieve data?
Solution
In your sample code above, the index is explicitly created as
We can add a
If we add an explicit
Alternatively, if we drop the (non-default)
Note that you can also drop the
NULLS LAST and the query is implicitly running NULLS FIRST (which is the default for ORDER BY .. DESC) so PostgreSQL would need to re-sort the data if it used the index. As a result the index would actually make the query many times slower than even the (already-slow) table scan.rds-9.6.5 root@db1=> create table performance (id integer, installs integer, hour timestamp without time zone);
CREATE TABLE
Time: 28.100 ms
rds-9.6.5 root@db1=> with generator as (select generate_series(1,166530) i)
[more] - > insert into performance (
[more] ( > select
[more] ( > i id,
[more] ( > (random()*1000)::integer installs,
[more] ( > (now() - make_interval(secs => i))::timestamp installs
[more] ( > from generator
[more] ( > );
INSERT 0 166530
Time: 244.872 ms
rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour desc nulls last);
CREATE INDEX
Time: 67.089 ms
rds-9.6.5 root@db1=> vacuum analyze performance;
VACUUM
Time: 43.552 msWe can add a
WHERE clause on the hour column so that using the index becomes a good idea - but notice how we still need to re-sort the data from the index.rds-9.6.5 root@db1=> explain select hour from performance where hour>now() order by hour desc limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------
Limit (cost=4.45..4.46 rows=1 width=8)
-> Sort (cost=4.45..4.46 rows=1 width=8)
Sort Key: hour DESC
-> Index Only Scan using hour_idx on performance (cost=0.42..4.44 rows=1 width=8)
Index Cond: (hour > now())
(5 rows)
Time: 0.789 msIf we add an explicit
NULLS LAST to your query then it will use the index as expected.rds-9.6.5 root@db1=> explain select hour from performance order by hour desc NULLS LAST limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Limit (cost=0.42..0.68 rows=10 width=8)
-> Index Only Scan using hour_idx on performance (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)
Time: 0.526 msAlternatively, if we drop the (non-default)
NULLS LAST from your index then the query will use it as expected without modification.rds-9.6.5 root@db1=> drop index hour_idx;
DROP INDEX
Time: 4.124 ms
rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour desc);
CREATE INDEX
Time: 69.220 ms
rds-9.6.5 root@db1=> explain select hour from performance order by hour desc limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Limit (cost=0.42..0.68 rows=10 width=8)
-> Index Only Scan using hour_idx on performance (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)
Time: 0.725 msNote that you can also drop the
DESC from your index; PostgreSQL can scan indexes both forwards and backwards and on single-column indexes it's generally unnecessary to reverse them. You only need to be careful about having the right combination of order and nulls first/last.rds-9.6.5 root@db1=> drop index hour_idx;
DROP INDEX
Time: 3.837 ms
rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour);
CREATE INDEX
Time: 94.815 ms
rds-9.6.5 root@db1=> explain select hour from performance order by hour desc limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Limit (cost=0.42..0.68 rows=10 width=8)
-> Index Only Scan Backward using hour_idx on performance (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)
Time: 0.740 msCode Snippets
rds-9.6.5 root@db1=> create table performance (id integer, installs integer, hour timestamp without time zone);
CREATE TABLE
Time: 28.100 ms
rds-9.6.5 root@db1=> with generator as (select generate_series(1,166530) i)
[more] - > insert into performance (
[more] ( > select
[more] ( > i id,
[more] ( > (random()*1000)::integer installs,
[more] ( > (now() - make_interval(secs => i))::timestamp installs
[more] ( > from generator
[more] ( > );
INSERT 0 166530
Time: 244.872 ms
rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour desc nulls last);
CREATE INDEX
Time: 67.089 ms
rds-9.6.5 root@db1=> vacuum analyze performance;
VACUUM
Time: 43.552 msrds-9.6.5 root@db1=> explain select hour from performance where hour>now() order by hour desc limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------
Limit (cost=4.45..4.46 rows=1 width=8)
-> Sort (cost=4.45..4.46 rows=1 width=8)
Sort Key: hour DESC
-> Index Only Scan using hour_idx on performance (cost=0.42..4.44 rows=1 width=8)
Index Cond: (hour > now())
(5 rows)
Time: 0.789 msrds-9.6.5 root@db1=> explain select hour from performance order by hour desc NULLS LAST limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Limit (cost=0.42..0.68 rows=10 width=8)
-> Index Only Scan using hour_idx on performance (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)
Time: 0.526 msrds-9.6.5 root@db1=> drop index hour_idx;
DROP INDEX
Time: 4.124 ms
rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour desc);
CREATE INDEX
Time: 69.220 ms
rds-9.6.5 root@db1=> explain select hour from performance order by hour desc limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Limit (cost=0.42..0.68 rows=10 width=8)
-> Index Only Scan using hour_idx on performance (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)
Time: 0.725 msrds-9.6.5 root@db1=> drop index hour_idx;
DROP INDEX
Time: 3.837 ms
rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour);
CREATE INDEX
Time: 94.815 ms
rds-9.6.5 root@db1=> explain select hour from performance order by hour desc limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Limit (cost=0.42..0.68 rows=10 width=8)
-> Index Only Scan Backward using hour_idx on performance (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)
Time: 0.740 msContext
StackExchange Database Administrators Q#193985, answer score: 19
Revisions (0)
No revisions yet.