patternsqlMinor
ORDER BY count and Bitmap Heap Scan
Viewed 0 times
orderscanheapbitmapandcount
Problem
I have a table with hits:
This could be some samples in there:
I don't understand why PostgreSQL handles these two queries differently (just the
Version 1:
```
EXPLAIN ANALYSE
SELECT COUNT(*) AS count, link_id
FROM "hits"
WHERE ("hits"."created_at" BETWEEN '2015-08-31 00:00' AND '2015-09-30 00:00')
GROUP BY "hits"."link_id"
ORDER BY "hits"."count" DESC, "hits"."link_id" ASC
LIMIT 50;
QUERY PLAN
------------------------------------------------------------------------------------------
Column | Type | Modifiers
--------------+-----------------------------+---------------------------------------------------
id | integer | not null default nextval('hits_id_seq'::regclass)
link_id | integer |
country_code | character varying(10) |
created_at | timestamp without time zone |
Indexes:
"hits_pkey" PRIMARY KEY, btree (id)
"hits_created_at_link_id_idx" btree (created_at, link_id)
"index_hits_on_country_code" btree (country_code)
"index_hits_on_link_id" btree (link_id)This could be some samples in there:
id | link_id | country_code | created_at
----------+---------+--------------+---------------------------
31084701 | 42 | IS | 2015-04-24 11:01:11.30736
31084705 | 42 | ES | 2015-04-24 11:01:12.5688
31084745 | 42 | LS | 2015-04-24 11:01:25.1832
31084943 | 42 | CX | 2015-04-24 11:02:27.62448
31085010 | 42 | KH | 2015-04-24 11:02:48.7536
31085053 | 42 | GL | 2015-04-24 11:03:02.31408
31085361 | 42 | KW | 2015-04-24 11:04:39.44496
31085590 | 42 | KY | 2015-04-24 11:05:51.6624
31085591 | 42 | BI | 2015-04-24 11:05:51.97776
31085675 | 42 | BS | 2015-04-24 11:06:18.468I don't understand why PostgreSQL handles these two queries differently (just the
ORDER BY is rephrased!):Version 1:
```
EXPLAIN ANALYSE
SELECT COUNT(*) AS count, link_id
FROM "hits"
WHERE ("hits"."created_at" BETWEEN '2015-08-31 00:00' AND '2015-09-30 00:00')
GROUP BY "hits"."link_id"
ORDER BY "hits"."count" DESC, "hits"."link_id" ASC
LIMIT 50;
QUERY PLAN
------------------------------------------------------------------------------------------
Solution
The reason is that Postgres allows "attribute notation" for a function with a single parameter. This can be used as substitute for generated columns. It's a legacy feature that's not used much any more. Details here:
If this was not allowed, your first query would just raise a syntax error, since there is no column
One can reference input and output columns in
In absence of an actual column
And
This is one of the reasons why I keep advising against the use of basic function names as column name (or alias).
- Store common query as column?
If this was not allowed, your first query would just raise a syntax error, since there is no column
"hits"."count".One can reference input and output columns in
ORDER BY (and GROUP BY, but not in WHERE or HAVING). While your second query references the output column count (the column alias for count(*)), the first query looks for an input column, since the column name is schema-qualified.In absence of an actual column
"hits"."count", Postgres interprets this as count(hits), which is the same as count(hits.*) like you can see in the EXPLAIN output:Sort Key: (count(hits.*)) DESC, link_idAnd
count(*) is implemented differently from count(). The first only checks for the existence of a row (which is always at least slightly faster), while the second has to look at actual values - so the index-only scan we see for the other query is out.This is one of the reasons why I keep advising against the use of basic function names as column name (or alias).
Code Snippets
Sort Key: (count(hits.*)) DESC, link_idContext
StackExchange Database Administrators Q#129410, answer score: 3
Revisions (0)
No revisions yet.