HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Pagination with PostgreSQL 9.3: counting number of pages

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlpaginationnumbercountingwithpages

Problem

I'm implementing pagination and sorting of rows in, let's say, products table, using multicolumn index on category, score and id.

-- index
create index category_score_id on products(category, score, id)
where active = true;

-- selecting first page
select * from products where
category = 1
and active = true
order by score desc, id desc
limit 30;

-- selecting following pages
select * from products where
category = 1
and active = true
and (score, id)

This works perfectly fine and as expected, using only index scan:

-- explain analyze of 2-nd query:
Limit  (cost=0.00..99.52 rows=30 width=2591) (actual time=0.090..0.179 rows=30 loops=1)
  ->  Index Scan Backward using category_score_id on products  (cost=0.00..76435.07 rows=23041 width=2591) (actual time=0.089..0.172 rows=30 loops=1)
        Index Cond: ((category = 1) AND (ROW(score, id) < ROW(893, 102458)))
Total runtime: 0.085 ms


What I am curious about is why the following query requires extra Bitmap Heap Scan:

select count(*) from products where
category = 1
and active = true;


-- explain analyze:
Aggregate (cost=49987.80..49987.81 rows=1 width=0) (actual time=104.847..104.847 rows=1 loops=1)
-> Bitmap Heap Scan on products (cost=538.85..49930.20 rows=23041 width=0) (actual time=8.919..98.952 rows=47937 loops=1)
Recheck Cond: ((category = 1) AND active)
Rows Removed by Index Recheck: 93006
-> Bitmap Index Scan on category_score_id (cost=0.00..533.09 rows=23041 width=0) (actual time=7.181..7.181 rows=47937 loops=1)
Index Cond: (category = 1)
Total runtime: 104.892 ms
`

Solution

Index definition

Your multicolumn index generally looks good. If all or most of your queries use order by score desc, id desc, it would be a bit more efficient to define it with matching sort order and a simpler condition:

CREATE INDEX prod_category_score_id ON products(category, score DESC, id DESC)
WHERE active;


  • WHERE active = TRUE is just a more complicated way of saying WHERE active.



  • While Index Scan Backward is practically as fast as a plain index scan, access is simpler with matching sort order in multicolumn index.



Why the extra Bitmap Heap Scan?

Due to the MVCC model of Postgres it needs to make sure that entries in the index are actually still alive in the table.

The planner method Index Scan works best for a small number of rows. Postgres takes each row pointer from the index and fetches the row from the heap (the table) sequentially, thereby also checking if the row is actually visible to the current transaction.

With a growing number of hits, this gets less effective. If multiple rows reside on the same data page, it's more efficient to collect tuples and visit the same page only once. That's what a Bitmap Index Scan does essentially (among other things).

When counting rows with count(*), Postgres wouldn't need data from the heap. But it still must verify that each tuple is visible and actually counts. The Wiki page about "Slow counting".

Postgres 9.2 introduced Index Only Scans. If the "visibility map" says all tuples of a page are visible to all current transactions, Postgres does not have to visit the heap to verify. That would be faster for a query such as yours, that only counts a small percentage of rows, most efficiently determined by using the index.

Obviously, you have a lot of concurrent write operations. Or your autovacuum settings are not aggressive enough. Else, you might see a faster Index Only Scan.

The Postgres Wiki about that.

Code Snippets

CREATE INDEX prod_category_score_id ON products(category, score DESC, id DESC)
WHERE active;

Context

StackExchange Database Administrators Q#73175, answer score: 3

Revisions (0)

No revisions yet.