HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlMinor

Why does the query plan still sort a table despite having an index sorted on the columns?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
indexwhythedespitecolumnsqueryhavingplansortdoes

Problem

I'm using Postgres 9.1
I have two tables I am joining:

wikidb=> \d page
                         Table "public.page"
        Column         |     Type      |          Modifiers           
-----------------------+---------------+------------------------------
 page_id               | bigint        | not null
 page_namespace        | integer       | not null default 0
 page_title            | text          | not null default ''::text
 [...]
Indexes:
    [...]
    "page_page_namespace_page_title_idx" UNIQUE, btree (page_namespace, page_title)

wikidb=> \d pagelinks
                 Table "public.pagelinks"
      Column       |  Type   |         Modifiers          
-------------------+---------+----------------------------
 pl_from           | bigint  | not null default 0::bigint
 pl_namespace      | integer | not null default 0
 pl_title          | text    | not null default ''::text
 [...]
Indexes:
    [...]
    "pagelinks_pl_namespace_pl_title_pl_from_idx" btree (pl_namespace, pl_title, pl_from)


Notice how both have indices on the (namespace, title) columns. I am interested in finding out how many (pl_namespace,pl_title) pairs in the pagelinks table do not show up in the page table as (page_namespace, page_title).

If I use a join, then I get the following plan:

```
wikidb=> explain SELECT COUNT(*)
FROM pagelinks
LEFT OUTER JOIN page
ON page.page_namespace = pagelinks.pl_namespace AND
page.page_title = pagelinks.pl_title
WHERE page.page_namespace IS NULL AND page.page_title IS NULL;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=1310748.56..1310748.57 rows=1 width=0)
-> Merge Anti Join (cost=1189384.02..1310748.56 rows=1 width=0)
Merge Cond: ((pagelinks.pl_title = page.page_title) AND (pagelinks.pl_namespace = page.page_namespace))

Solution

PostgreSQL thinks the sort merge is faster. And in my hands on 9.1 it actually is faster than walking both indexes. You can try it for yourself by set enable_sort to off; and see what plan it gives, and how long that one takes to run.

Sorting is pretty efficient. Walking indexes is inefficient in 9.1 because it still has to visit the table to make sure the row is actually visible. Indexes don't store visibility information. 9.2 introduced index-only scans, which gets around part of that problem.

Also, the index leaf pages are not necessarily logical order, so walking the index in logical order can lead to a lot of random IO. (Of course, checking the table for visibility off of an index scan will as well.)

There have been a lot of improvements made since 9.1.

Context

StackExchange Database Administrators Q#115402, answer score: 5

Revisions (0)

No revisions yet.