HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Selecting top 10 from indexed field of a big table takes too long

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fieldtoplongtooindexedbigselectingfromtakestable

Problem

I have a table with 165M records like this:

Performance
   id        integer
   installs  integer
   hour      timestamp without time zone


I also have an index on hour:

CREATE INDEX hour_idx
  ON performance
  USING btree
  (hour DESC NULLS LAST);


However, select top 10 records ordered by hour takes 6 minutes!

EXPLAIN ANALYZE  select hour from performance order by hour desc limit 10


Returns

Limit  (cost=7952135.23..7952135.25 rows=10 width=8) (actual time=376313.958..376313.964 rows=10 loops=1)
  ->  Sort  (cost=7952135.23..8368461.00 rows=166530310 width=8) (actual time=376313.957..376313.960 rows=10 loops=1)
        Sort Key: hour
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Seq Scan on performance  (cost=0.00..4353475.10 rows=166530310 width=8) (actual time=0.006..327149.828 rows=192330557 loops=1)
Planning time: 0.070 ms
Execution time: 376330.573 ms


Why is it taking so long? If there is an index on date field desc - shouldn't it be super quick to retrieve data?

Solution

In your sample code above, the index is explicitly created as NULLS LAST and the query is implicitly running NULLS FIRST (which is the default for ORDER BY .. DESC) so PostgreSQL would need to re-sort the data if it used the index. As a result the index would actually make the query many times slower than even the (already-slow) table scan.

rds-9.6.5 root@db1=> create table performance (id integer, installs integer, hour timestamp without time zone);
CREATE TABLE
Time: 28.100 ms

rds-9.6.5 root@db1=> with generator as (select generate_series(1,166530) i)
[more] - > insert into performance (
[more] ( >   select
[more] ( >     i id,
[more] ( >     (random()*1000)::integer installs,
[more] ( >     (now() - make_interval(secs => i))::timestamp installs
[more] ( >   from generator
[more] ( > );
INSERT 0 166530
Time: 244.872 ms

rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour desc nulls last);
CREATE INDEX
Time: 67.089 ms

rds-9.6.5 root@db1=> vacuum analyze performance;
VACUUM
Time: 43.552 ms


We can add a WHERE clause on the hour column so that using the index becomes a good idea - but notice how we still need to re-sort the data from the index.

rds-9.6.5 root@db1=> explain select hour from performance where hour>now() order by hour desc limit 10;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Limit  (cost=4.45..4.46 rows=1 width=8)
   ->  Sort  (cost=4.45..4.46 rows=1 width=8)
         Sort Key: hour DESC
         ->  Index Only Scan using hour_idx on performance  (cost=0.42..4.44 rows=1 width=8)
               Index Cond: (hour > now())
(5 rows)

Time: 0.789 ms


If we add an explicit NULLS LAST to your query then it will use the index as expected.

rds-9.6.5 root@db1=> explain select hour from performance order by hour desc NULLS LAST limit 10;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.68 rows=10 width=8)
   ->  Index Only Scan using hour_idx on performance  (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)

Time: 0.526 ms


Alternatively, if we drop the (non-default) NULLS LAST from your index then the query will use it as expected without modification.

rds-9.6.5 root@db1=> drop index hour_idx;
DROP INDEX
Time: 4.124 ms

rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour desc);
CREATE INDEX
Time: 69.220 ms

rds-9.6.5 root@db1=> explain select hour from performance order by hour desc limit 10;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.68 rows=10 width=8)
   ->  Index Only Scan using hour_idx on performance  (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)

Time: 0.725 ms


Note that you can also drop the DESC from your index; PostgreSQL can scan indexes both forwards and backwards and on single-column indexes it's generally unnecessary to reverse them. You only need to be careful about having the right combination of order and nulls first/last.

rds-9.6.5 root@db1=> drop index hour_idx;
DROP INDEX
Time: 3.837 ms

rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour);
CREATE INDEX
Time: 94.815 ms

rds-9.6.5 root@db1=> explain select hour from performance order by hour desc limit 10;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.68 rows=10 width=8)
   ->  Index Only Scan Backward using hour_idx on performance  (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)

Time: 0.740 ms

Code Snippets

rds-9.6.5 root@db1=> create table performance (id integer, installs integer, hour timestamp without time zone);
CREATE TABLE
Time: 28.100 ms

rds-9.6.5 root@db1=> with generator as (select generate_series(1,166530) i)
[more] - > insert into performance (
[more] ( >   select
[more] ( >     i id,
[more] ( >     (random()*1000)::integer installs,
[more] ( >     (now() - make_interval(secs => i))::timestamp installs
[more] ( >   from generator
[more] ( > );
INSERT 0 166530
Time: 244.872 ms

rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour desc nulls last);
CREATE INDEX
Time: 67.089 ms

rds-9.6.5 root@db1=> vacuum analyze performance;
VACUUM
Time: 43.552 ms
rds-9.6.5 root@db1=> explain select hour from performance where hour>now() order by hour desc limit 10;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Limit  (cost=4.45..4.46 rows=1 width=8)
   ->  Sort  (cost=4.45..4.46 rows=1 width=8)
         Sort Key: hour DESC
         ->  Index Only Scan using hour_idx on performance  (cost=0.42..4.44 rows=1 width=8)
               Index Cond: (hour > now())
(5 rows)

Time: 0.789 ms
rds-9.6.5 root@db1=> explain select hour from performance order by hour desc NULLS LAST limit 10;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.68 rows=10 width=8)
   ->  Index Only Scan using hour_idx on performance  (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)

Time: 0.526 ms
rds-9.6.5 root@db1=> drop index hour_idx;
DROP INDEX
Time: 4.124 ms

rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour desc);
CREATE INDEX
Time: 69.220 ms

rds-9.6.5 root@db1=> explain select hour from performance order by hour desc limit 10;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.68 rows=10 width=8)
   ->  Index Only Scan using hour_idx on performance  (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)

Time: 0.725 ms
rds-9.6.5 root@db1=> drop index hour_idx;
DROP INDEX
Time: 3.837 ms

rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour);
CREATE INDEX
Time: 94.815 ms

rds-9.6.5 root@db1=> explain select hour from performance order by hour desc limit 10;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.68 rows=10 width=8)
   ->  Index Only Scan Backward using hour_idx on performance  (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)

Time: 0.740 ms

Context

StackExchange Database Administrators Q#193985, answer score: 19

Revisions (0)

No revisions yet.