principlesqlMinor
MySQL vs PostgreSQL: Benchmarking COUNT(*) execution speed
Viewed 0 times
postgresqlmysqlspeedcountbenchmarkingexecution
Problem
I benchmark DBs to find out the best for my project and I found that
I have a table with ~200M records. MySQL table definition:
Request (returns ~30M):
runs:
Explain:
MySQL:
PostgreSQL
Server: AWS RDS (db.r3.xlarge) vCPU:4 Memory:30Gb
Updated (2016-09-20):
```
> explain (analyze, buffers) SELECT COUNT(*) FROM t1 WHERE t2_id = 7;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=4469365.02..4469365.03 rows=1 width=4) (actual time=1213456.539..1213456.539 rows=1 loops=1)
Buffers: shared read=2734808
-> Bitmap Heap Scan on t1 (cost=715817.34..4382635.74 rows=34691712 width=4) (actual time=64015.828..1205542.421 rows=31383566 loops=1)
Rechec
count(*) is extremely slow in PostgeSQL. And I don't understand is it a normal behaviour of PostgeSQL or I do something wrong.I have a table with ~200M records. MySQL table definition:
CREATE TABLE t1 (
id int(11) NOT NULL AUTO_INCREMENT,
t2_id int(11) NOT NULL,
....
PRIMARY KEY (id),
KEY index_t2 (t2_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;Request (returns ~30M):
SELECT COUNT(*) FROM t1 WHERE t2_id = 7;runs:
25,797ms MySQL (v5.7.11)1,222,168ms PostgeSQL (v9.5) Explain:
MySQL:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: ref
possible_keys: index_t2
key: index_t2
key_len: 4
ref: const
rows: 59438630
filtered: 100.00
Extra: Using index
1 row in set, 1 warning (0.00 sec)PostgreSQL
Aggregate (cost=4469365.02..4469365.03 rows=1 width=0)
-> Bitmap Heap Scan on t1 (cost=715817.34..4382635.74 rows=34691712 width=0)
Recheck Cond: (t2_id = 7)
-> Bitmap Index Scan on index_t2 (cost=0.00..707144.41 rows=34691712 width=0)
Index Cond: (t2_id = 7)Server: AWS RDS (db.r3.xlarge) vCPU:4 Memory:30Gb
Updated (2016-09-20):
```
> explain (analyze, buffers) SELECT COUNT(*) FROM t1 WHERE t2_id = 7;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=4469365.02..4469365.03 rows=1 width=4) (actual time=1213456.539..1213456.539 rows=1 loops=1)
Buffers: shared read=2734808
-> Bitmap Heap Scan on t1 (cost=715817.34..4382635.74 rows=34691712 width=4) (actual time=64015.828..1205542.421 rows=31383566 loops=1)
Rechec
Solution
The way that both RDBMS do the count differs. In InnoDB we have the following behaviour by default:
To process a SELECT COUNT(*) FROM t statement, InnoDB scans an index of
the table, which takes some time if the index is not entirely in the
buffer pool.
For Postgres, you may want to try to see if an index-only scan (which is closer to the InnoDB behaviour) can help you on this. More info here. Due the amount of rows and the bad cardinality of that value (almost 15% of the table according stats), I can't warranty that it will work, but you can try:
To process a SELECT COUNT(*) FROM t statement, InnoDB scans an index of
the table, which takes some time if the index is not entirely in the
buffer pool.
For Postgres, you may want to try to see if an index-only scan (which is closer to the InnoDB behaviour) can help you on this. More info here. Due the amount of rows and the bad cardinality of that value (almost 15% of the table according stats), I can't warranty that it will work, but you can try:
SELECT COUNT(t2_id) FROM t1 WHERE t2_id = 7;Code Snippets
SELECT COUNT(t2_id) FROM t1 WHERE t2_id = 7;Context
StackExchange Database Administrators Q#149729, answer score: 5
Revisions (0)
No revisions yet.