HiveBrain v1.2.0
Get Started
← Back to all entries
principlesqlMinor

MySQL vs PostgreSQL: Benchmarking COUNT(*) execution speed

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlmysqlspeedcountbenchmarkingexecution

Problem

I benchmark DBs to find out the best for my project and I found that count(*) is extremely slow in PostgeSQL. And I don't understand is it a normal behaviour of PostgeSQL or I do something wrong.

I have a table with ~200M records. MySQL table definition:

CREATE TABLE t1 (
  id int(11) NOT NULL AUTO_INCREMENT,
  t2_id int(11) NOT NULL,
....  
  PRIMARY KEY (id),
  KEY index_t2 (t2_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


Request (returns ~30M):

SELECT COUNT(*) FROM t1 WHERE t2_id = 7;


runs:

25,797ms MySQL (v5.7.11)

1,222,168ms PostgeSQL (v9.5)

Explain:

MySQL:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: index_t2
          key: index_t2
      key_len: 4
          ref: const
         rows: 59438630
     filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.00 sec)


PostgreSQL

Aggregate  (cost=4469365.02..4469365.03 rows=1 width=0)
 ->  Bitmap Heap Scan on t1  (cost=715817.34..4382635.74 rows=34691712 width=0)
       Recheck Cond: (t2_id = 7)
       ->  Bitmap Index Scan on index_t2  (cost=0.00..707144.41 rows=34691712 width=0)
             Index Cond: (t2_id = 7)


Server: AWS RDS (db.r3.xlarge) vCPU:4 Memory:30Gb

Updated (2016-09-20):

```
> explain (analyze, buffers) SELECT COUNT(*) FROM t1 WHERE t2_id = 7;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=4469365.02..4469365.03 rows=1 width=4) (actual time=1213456.539..1213456.539 rows=1 loops=1)
Buffers: shared read=2734808
-> Bitmap Heap Scan on t1 (cost=715817.34..4382635.74 rows=34691712 width=4) (actual time=64015.828..1205542.421 rows=31383566 loops=1)
Rechec

Solution

The way that both RDBMS do the count differs. In InnoDB we have the following behaviour by default:


To process a SELECT COUNT(*) FROM t statement, InnoDB scans an index of
the table, which takes some time if the index is not entirely in the
buffer pool.

For Postgres, you may want to try to see if an index-only scan (which is closer to the InnoDB behaviour) can help you on this. More info here. Due the amount of rows and the bad cardinality of that value (almost 15% of the table according stats), I can't warranty that it will work, but you can try:

SELECT COUNT(t2_id) FROM t1 WHERE t2_id = 7;

Code Snippets

SELECT COUNT(t2_id) FROM t1 WHERE t2_id = 7;

Context

StackExchange Database Administrators Q#149729, answer score: 5

Revisions (0)

No revisions yet.