HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Index scan on UNIQUE index for count(*)

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
indexuniquescanforcount

Problem

I've got a table t with ~23 million rows (4248 MB in size). There's a column row_id in it, with a not null constraint. And a unique index p1 on t(row_id).

When I do select count(*) from t to count all the rows in the table, the planner tells me:

Seq Scan on t  (cost=0.00..686191.06 rows=23176906 width=0)


I would have expected a fast Index Only Scan (index p1 occupies only 698 MB - 6x less).

If I do SET enable_seqscan = off, then the planner still insists on reading the table rows:

QUERY PLAN
  ->  Bitmap Heap Scan on t  (cost=210923.32..897114.38 rows=23176906 width=0)
        ->  Bitmap Index Scan on p1  (cost=0.00..205129.09 rows=23176906 width=0)


Why is the unique index ignored in this case? What's the catch?

I am using PostgreSQL 10.4

For a clean room test I did the following:

```
create table tmp
(
row_id varchar(15) unique not null,

);

insert into tmp (row_id, ) select row_id, from t;
commit;
analyze tmp;

set enable_seqscan = on;
explain (analyze, buffers) select count(*) from tmp;
QUERY PLAN
Aggregate (cost=744070.45..744070.46 rows=1 width=8) (actual time=5631.501..5631.502 rows=1 loops=1)
Buffers: shared hit=209109 read=245254
-> Seq Scan on tmp (cost=0.00..686128.96 rows=23176596 width=0) (actual time=0.014..3481.967 rows=23176906 loops=1)
Buffers: shared hit=209109 read=245254
Planning time: 0.064 ms
Execution time: 5631.531 ms

SET enable_seqscan = off;
explain (analyze, buffers) select count(*) from tmp;
QUERY PLAN
Aggregate (cost=980282.14..980282.15 rows=1 width=8) (actual time=16224.408..16224.408 rows=1 loops=1)
Buffers: shared hit=26285 read=542015
-> Bitmap Heap Scan on tmp (cost=236211.69..922340.65 rows=23176596 width=0) (actual time=10030.115..14157.288 rows=23176906 loops=1)
Heap Blocks: exact=454363
Buffers: shared hit=26285 read=542015
-> Bitmap Index Scan on tmp_row_id_key (cost=0.00..230417.54 rows=23176596 width=0) (actual time=9929.5

Solution

As to why you do not get an index scan with SET enable_seqscan = off, you should be getting an index-only scan. Your situation can not yet be recreated with the data you've provided. This certainly works with PostgreSQL 10.4. I can't speak to your own use case, and there are a lot of reasons why you may not get an index scan in the real world. Ultimately, debugging a question along those lines will result in an answer that is simply "planner estimates" but will require a lot more data about your environment, configuration, and the plans with and without SET enable_seqscan = off.

Sample Data

BEGIN;
  CREATE TABLE foo ( x int NOT NULL UNIQUE );
  INSERT INTO foo (x) SELECT generate_series(1,1e6);
COMMIT;

ANALYZE foo; -- don't forget to analyze


With seq_scan

test=# EXPLAIN SELECT count(*) FROM foo;
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=10633.55..10633.56 rows=1 width=8)
   ->  Gather  (cost=10633.33..10633.54 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=9633.33..9633.34 rows=1 width=8)
               ->  Parallel Seq Scan on foo  (cost=0.00..8591.67 rows=416667 width=0)
(5 rows)


Notice we're doing a "Parallel Seq Scan"

Without seq_scan

SET enable_seq_scan = off;

test=# EXPLAIN SELECT count(*) FROM foo;
                                                  QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=26616.97..26616.98 rows=1 width=8)
   ->  Gather  (cost=26616.76..26616.97 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=25616.76..25616.77 rows=1 width=8)
               ->  Parallel Index Only Scan using foo_x_key on foo  (cost=0.42..24575.09 rows=416667 width=0)
(5 rows)


Notice we're doing a "Parallel Index Only Scan"

Code Snippets

BEGIN;
  CREATE TABLE foo ( x int NOT NULL UNIQUE );
  INSERT INTO foo (x) SELECT generate_series(1,1e6);
COMMIT;

ANALYZE foo; -- don't forget to analyze
test=# EXPLAIN SELECT count(*) FROM foo;
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=10633.55..10633.56 rows=1 width=8)
   ->  Gather  (cost=10633.33..10633.54 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=9633.33..9633.34 rows=1 width=8)
               ->  Parallel Seq Scan on foo  (cost=0.00..8591.67 rows=416667 width=0)
(5 rows)
SET enable_seq_scan = off;

test=# EXPLAIN SELECT count(*) FROM foo;
                                                  QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=26616.97..26616.98 rows=1 width=8)
   ->  Gather  (cost=26616.76..26616.97 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=25616.76..25616.77 rows=1 width=8)
               ->  Parallel Index Only Scan using foo_x_key on foo  (cost=0.42..24575.09 rows=416667 width=0)
(5 rows)

Context

StackExchange Database Administrators Q#209331, answer score: 3

Revisions (0)

No revisions yet.