HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

Poor performance on query with LIMIT when I add an ORDER BY?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
orderwithlimitquerypoorperformancewhenadd

Problem

Given the table:

Column
Type
Modifiers
Storage

id
bigint
not null default nextval('items_id_seq'::regclass)
plain

data
text
not null
extended

object_id
bigint
not null
plain

and indexes:

  • "items_pkey" PRIMARY KEY, btree (id)



  • "items_object_id_idx" btree (object_id)



When I execute:

SELECT *
FROM items
WHERE object_id = 123
LIMIT 1;


It returns 0 rows very quickly. However, when I execute this query with ORDER BY, it hangs for a very long time:

SELECT *
FROM items
WHERE object_id = 123
ORDER BY id DESC  -- I added the ORDER BY
LIMIT 1;


What explains this discrepancy?
Query Plans
Fast Query (without ORDER BY)

QUERY PLAN                                    
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.34 rows=1 width=63) (actual time=0.014..0.014 rows=0 loops=1)
   ->  Index Scan using items_object_id_operation_idx on items  (cost=0.56..2579.16 rows=929 width=63) (actual time=0.013..0.013 rows=0 loops=1)
         Index Cond: (object_id = 123::bigint)
 Total runtime: 0.029 ms


Slow query (with the ORDER BY)

QUERY PLAN                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.44..1269.14 rows=1 width=63) (actual time=873796.061..873796.061 rows=0 loops=1)
   ->  Index Scan Backward using items_pkey on items  (cost=0.44..1164670.11 rows=918 width=63) (actual time=873796.059..873796.059 rows=0 loops=1)
         Filter: (object_id = 123::bigint)
         Rows Removed by Filter: 27942522
 Total runtime: 873796.113 ms

Solution

Trying to explain why there is a difference in performance between the two queries.

This one: SELECT * FROM "items" WHERE "object_id" = '123' LIMIT 1 is satisfied by any one row with the matching object_id, so the index on object_id is a natural choice. The query requires minimal I/O: an index scan to find the first matching value, plus one heap read to fetch the entire row.

The alternative: SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1 requires all rows with the matching object_id to be sorted by another column, id, then the row with the maximum value of id to be returned. If you were to use the index on object_id, you would need to perform the following operations: scan the index to find every matching object_id; for every match go fetch the actual row; then sort all fetched rows by id and return the one with the largest id.

The alternative chosen by the optimizer, presumably based on the object_id histogram, is: scan the index on id backwards, in its entirety; for each value go fetch the row and check if the value of object_id matches; return the first matching row, which will have the maximum possible id value. This alternative avoids sorting the rows, so I guess the optimizer prefers it to using the index on object_id.

The presence of an index on (object_id asc, id desc) allows for yet another alternative: scan this new index for the first entry matching the supplied object_id value, which by definition will have the highest id value; go fetch one matching row and return. Obviously, this is the most efficient approach.

Context

StackExchange Database Administrators Q#110636, answer score: 22

Revisions (0)

No revisions yet.