HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Postgres prefers slow Seq Scan over fast Index Scan

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fastprefersscanpostgresseqslowindexover

Problem

Tables

I have the following two tables.

books, which contains ~5 million rows:

Table "public.books"
 Column  |           Type           |                     Modifiers
---------+--------------------------+----------------------------------------------------
 id      | integer                  | not null default nextval('books_id_seq'::regclass)
 run__id | integer                  |
 time    | timestamp with time zone | not null
 venue   | character varying        | not null
 base    | character varying        | not null
 quote   | character varying        | not null
Indexes:
    "books_pkey" PRIMARY KEY, btree (id)
    "run_books_index" UNIQUE, btree (run__id, id)
Foreign-key constraints:
    "books_run__id_fkey" FOREIGN KEY (run__id) REFERENCES runs(id)
Referenced by:
    TABLE "orders" CONSTRAINT "orders_book__id_fkey" FOREIGN KEY (book__id) REFERENCES books(id)


and orders, which contains ~3 billion rows:

Table "public.orders"
  Column  |       Type       | Modifiers
----------+------------------+-----------
 book__id | integer          | not null
 is_buy   | boolean          | not null
 qty      | double precision | not null
 price    | double precision | not null
Indexes:
    "orders_pkey" PRIMARY KEY, btree (book__id, is_buy, price)
Foreign-key constraints:
    "orders_book__id_fkey" FOREIGN KEY (book__id) REFERENCES books(id)


Query

I want to run the following query:
SELECT * FROM books b JOIN orders o ON o.book__id = b.id WHERE b.run__id = 1


Postgres suggests the following execution plan:

```
orderbooks=> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM books b JOIN orders o ON o.book__id = b.id WHERE b.run__id = 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=2465.94..58122020.57 rows=762939 width=53) (actual time=1394110.723..2561879.775 rows=4521

Solution

There seem to be several things off:

-
The estimates are wrong. Since you already ANALYZEd both tables, retry with a higher default_statistics_target.

-
Since index scans seem to be estimated too expensive, perhaps your setting of random_page_cost is too high. Also, perhaps effective_cache_size is too low.

If you didn't select everything from books, a covering index might be an idea...

Context

StackExchange Database Administrators Q#283781, answer score: 4

Revisions (0)

No revisions yet.