patternsqlMajor
For absolute performance, is SUM faster or COUNT?
Viewed 0 times
absolutefasterperformanceforsumcount
Problem
This relates to counting the number of records that match a certain condition, e.g.
I tend to prefer
However, this is just as valid
I would have thought COUNT is preferable for 2 reasons:
a simple
its expression to be a simple integer value.
Does anyone have specific facts about the difference on specific RDBMS?
invoice amount > $100.I tend to prefer
COUNT(CASE WHEN invoice_amount > 100 THEN 1 END)However, this is just as valid
SUM(CASE WHEN invoice_amount > 100 THEN 1 ELSE 0 END)I would have thought COUNT is preferable for 2 reasons:
- Conveys the intention, which is to
COUNT
COUNTprobably involves
a simple
i += 1 operation somewhere, whereas SUM cannot count onits expression to be a simple integer value.
Does anyone have specific facts about the difference on specific RDBMS?
Solution
You mostly answered the question yourself already. I have a few morsels to add:
In PostgreSQL (and other RDBMS that support the
Or use it in a
Or with a simple
Or various other expressions. Performance is almost identical.
Since Postgres 9.4 there's also the aggregate
It's faster than all of the above by around 5 - 10 %:
If the query is as simple as your test case, with only a single count and nothing else, you can rewrite:
... which is the true king of performance, even without index.
With an applicable index it can be faster by orders of magnitude, especially with index-only scans.
Benchmarks
Postgres 13
db<>fiddle here
Basically the same results as for Postgres 10 below.
(I also added a test without the new parallelism to the fiddle, to compare apples with apples.)
Postgres 10
I ran a new series of tests for Postgres 10, including the aggregate
Simple setup:
Actual times vary quite a bit due to background noise and specifics of the test bed. Showing typical best times from a bigger set of tests. These two cases should capture the essence:
Test 1 counting ~ 1 % of all rows
db<>fiddle here
Test 2 counting ~ 33 % of all rows
db<>fiddle here
The last test in each set used an index-only scan, which is why it helped for counting one third of all rows. Plain index or bitmap index scans cannot compete with a sequential scan when involving roughly 5 % or more of all rows.
Old test for Postgres 9.1
To verify I ran a quick test with
74208 of 184568 rows qualified with the condition
Hardly any real difference in performance.
In PostgreSQL (and other RDBMS that support the
boolean type) you can use the boolean result of the test directly. Cast it to integer and SUM():SUM((amount > 100)::int))Or use it in a
NULLIF() expression and COUNT():COUNT(NULLIF(amount > 100, FALSE))Or with a simple
OR NULL:COUNT(amount > 100 OR NULL)Or various other expressions. Performance is almost identical.
COUNT() is typically very slightly faster than SUM(). Unlike SUM() and like Paul already commented, COUNT() never returns NULL, which may be convenient. Related:- Query optimization or missing indexes?
Since Postgres 9.4 there's also the aggregate
FILTER clause. See:- Return counts for multiple ranges in a single SELECT statement
It's faster than all of the above by around 5 - 10 %:
COUNT(*) FILTER (WHERE amount > 100)If the query is as simple as your test case, with only a single count and nothing else, you can rewrite:
SELECT count(*) FROM tbl WHERE amount > 100;... which is the true king of performance, even without index.
With an applicable index it can be faster by orders of magnitude, especially with index-only scans.
Benchmarks
Postgres 13
db<>fiddle here
Basically the same results as for Postgres 10 below.
(I also added a test without the new parallelism to the fiddle, to compare apples with apples.)
Postgres 10
I ran a new series of tests for Postgres 10, including the aggregate
FILTER clause and demonstrating the role of an index for small and big counts.Simple setup:
CREATE TABLE tbl (
tbl_id int
, amount int NOT NULL
);
INSERT INTO tbl
SELECT g, (random() * 150)::int
FROM generate_series (1, 1000000) g;
-- only relevant for the last test
CREATE INDEX ON tbl (amount);Actual times vary quite a bit due to background noise and specifics of the test bed. Showing typical best times from a bigger set of tests. These two cases should capture the essence:
Test 1 counting ~ 1 % of all rows
SELECT COUNT(NULLIF(amount > 148, FALSE)) FROM tbl; -- 140 ms
SELECT SUM((amount > 148)::int) FROM tbl; -- 136 ms
SELECT SUM(CASE WHEN amount > 148 THEN 1 ELSE 0 END) FROM tbl; -- 133 ms
SELECT COUNT(CASE WHEN amount > 148 THEN 1 END) FROM tbl; -- 130 ms
SELECT COUNT((amount > 148) OR NULL) FROM tbl; -- 130 ms
SELECT COUNT(*) FILTER (WHERE amount > 148) FROM tbl; -- 118 ms -- !
SELECT count(*) FROM tbl WHERE amount > 148; -- without index -- 75 ms -- !!
SELECT count(*) FROM tbl WHERE amount > 148; -- with index -- 1.4 ms -- !!!db<>fiddle here
Test 2 counting ~ 33 % of all rows
SELECT COUNT(NULLIF(amount > 100, FALSE)) FROM tbl; -- 140 ms
SELECT SUM((amount > 100)::int) FROM tbl; -- 138 ms
SELECT SUM(CASE WHEN amount > 100 THEN 1 ELSE 0 END) FROM tbl; -- 139 ms
SELECT COUNT(CASE WHEN amount > 100 THEN 1 END) FROM tbl; -- 138 ms
SELECT COUNT(amount > 100 OR NULL) FROM tbl; -- 137 ms
SELECT COUNT(*) FILTER (WHERE amount > 100) FROM tbl; -- 132 ms -- !
SELECT count(*) FROM tbl WHERE amount > 100; -- without index -- 102 ms -- !!
SELECT count(*) FROM tbl WHERE amount > 100; -- with index -- 55 ms -- !!!db<>fiddle here
The last test in each set used an index-only scan, which is why it helped for counting one third of all rows. Plain index or bitmap index scans cannot compete with a sequential scan when involving roughly 5 % or more of all rows.
Old test for Postgres 9.1
To verify I ran a quick test with
EXPLAIN ANALYZE on a real life table in PostgreSQL 9.1.6.74208 of 184568 rows qualified with the condition
kat_id > 50. All queries return the same result. I ran each like 10 times in turns to exclude caching effects and appended the best result as note:SELECT SUM((kat_id > 50)::int) FROM log_kat; -- 438 ms
SELECT COUNT(NULLIF(kat_id > 50, FALSE)) FROM log_kat; -- 437 ms
SELECT COUNT(CASE WHEN kat_id > 50 THEN 1 END) FROM log_kat; -- 437 ms
SELECT COUNT((kat_id > 50) OR NULL) FROM log_kat; -- 436 ms
SELECT SUM(CASE WHEN kat_id > 50 THEN 1 ELSE 0 END) FROM log_kat; -- 432 msHardly any real difference in performance.
Code Snippets
SUM((amount > 100)::int))COUNT(NULLIF(amount > 100, FALSE))COUNT(amount > 100 OR NULL)COUNT(*) FILTER (WHERE amount > 100)SELECT count(*) FROM tbl WHERE amount > 100;Context
StackExchange Database Administrators Q#27558, answer score: 38
Revisions (0)
No revisions yet.