HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

Avoiding a sort on an already clustered index for group by

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
clusteredindexgroupalreadyforsortavoiding

Problem

On a table T with two fields, pid and did, the following query results in a seq. scan followed by a sort on pid:

select count(did), pid 
from  T
group by pid


Here is the query plan:

GroupAggregate  (cost=21566127.88..22326004.09 rows=987621 width=8)
->  Sort  (cost=21566127.88..21816127.88 rows=100000000 width=8)
    Sort Key: pid
     ->  Seq Scan on tc  (cost=0.00..1442478.00 rows=100000000 width=8)


However, the table already has clustered index on pid.

Why doesn't Postgres simply scan the table and compute the group by? Why does it need to sort on pid again?

How can I force Postgres to use the clustered index for the group by?

Solution

As you have no WHERE clause, you (the query, that is) must scan the whole table to be able to compute the aggregate. Once it has all the data, it is much easier to sort them directly than to go to an index and match the rows and return them in order. This latter would involve further I/O which is slow compared to operation in the memory.

If you were on PostgreSQL 9.2 (or above) and you had an index on (pid, did) then the optimizer might have chosen an index scan instead.

Context

StackExchange Database Administrators Q#46997, answer score: 3

Revisions (0)

No revisions yet.