HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

ORDER BY count and Bitmap Heap Scan

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
orderscanheapbitmapandcount

Problem

I have a table with hits:

Column    |            Type             |                     Modifiers                     
--------------+-----------------------------+---------------------------------------------------
 id           | integer                     | not null default nextval('hits_id_seq'::regclass)
 link_id      | integer                     | 
 country_code | character varying(10)       | 
 created_at   | timestamp without time zone | 
Indexes:
 "hits_pkey" PRIMARY KEY, btree (id)
 "hits_created_at_link_id_idx" btree (created_at, link_id)
 "index_hits_on_country_code" btree (country_code)
 "index_hits_on_link_id" btree (link_id)


This could be some samples in there:

id    | link_id | country_code |        created_at         
----------+---------+--------------+---------------------------
 31084701 |      42 | IS           | 2015-04-24 11:01:11.30736
 31084705 |      42 | ES           | 2015-04-24 11:01:12.5688
 31084745 |      42 | LS           | 2015-04-24 11:01:25.1832
 31084943 |      42 | CX           | 2015-04-24 11:02:27.62448
 31085010 |      42 | KH           | 2015-04-24 11:02:48.7536
 31085053 |      42 | GL           | 2015-04-24 11:03:02.31408
 31085361 |      42 | KW           | 2015-04-24 11:04:39.44496
 31085590 |      42 | KY           | 2015-04-24 11:05:51.6624
 31085591 |      42 | BI           | 2015-04-24 11:05:51.97776
 31085675 |      42 | BS           | 2015-04-24 11:06:18.468


I don't understand why PostgreSQL handles these two queries differently (just the ORDER BY is rephrased!):

Version 1:

```
EXPLAIN ANALYSE
SELECT COUNT(*) AS count, link_id
FROM "hits"
WHERE ("hits"."created_at" BETWEEN '2015-08-31 00:00' AND '2015-09-30 00:00')
GROUP BY "hits"."link_id"
ORDER BY "hits"."count" DESC, "hits"."link_id" ASC
LIMIT 50;
QUERY PLAN
------------------------------------------------------------------------------------------

Solution

The reason is that Postgres allows "attribute notation" for a function with a single parameter. This can be used as substitute for generated columns. It's a legacy feature that's not used much any more. Details here:

  • Store common query as column?



If this was not allowed, your first query would just raise a syntax error, since there is no column "hits"."count".

One can reference input and output columns in ORDER BY (and GROUP BY, but not in WHERE or HAVING). While your second query references the output column count (the column alias for count(*)), the first query looks for an input column, since the column name is schema-qualified.

In absence of an actual column "hits"."count", Postgres interprets this as count(hits), which is the same as count(hits.*) like you can see in the EXPLAIN output:

Sort Key: (count(hits.*)) DESC, link_id


And count(*) is implemented differently from count(). The first only checks for the existence of a row (which is always at least slightly faster), while the second has to look at actual values - so the index-only scan we see for the other query is out.

This is one of the reasons why I keep advising against the use of basic function names as column name (or alias).

Code Snippets

Sort Key: (count(hits.*)) DESC, link_id

Context

StackExchange Database Administrators Q#129410, answer score: 3

Revisions (0)

No revisions yet.