patternsqlMinor
Why is log(greatest()) so slow?
Viewed 0 times
greatestwhyslowlog
Problem
We have some complex query that was very slow. I managed to reduce the query to a simple reproduction. It seems that the combination of
Here is a full sql-fiddle example to run the queries - and you can also
So here is the slow query:
We just generate a series of 20k numbers and use
I thought that calculating the log may take long, but the following query is also fast (~5ms):
Just as a test I exchanged
The
Can anyone explain why the first query is so slow - and maybe anyone knows a workaround?
More details
slow platforms
I get similar results on all of these (first query is a magnitude slower):
count
When I change
greatest and log is the cause, but I don't understand why.Here is a full sql-fiddle example to run the queries - and you can also
View the execution Plans of the queries (press the link at the bottom of the query result on the sql-fiddle page)So here is the slow query:
select count(value)
from (
SELECT log(greatest(1e-9, x)) as value
from (select generate_series(1, 20000, 1) as x) as d
) t;We just generate a series of 20k numbers and use
log(greatest()). This query takes about 1.5 seconds.I thought that calculating the log may take long, but the following query is also fast (~5ms):
select count(value)
from (
SELECT log(x) as value
from (select generate_series(1, 20000, 1) as x) as d
) t;Just as a test I exchanged
greatest and log - this is also fast (~5ms):select count(value)
from (
SELECT greatest(1e-9, log(x)) as value
from (select generate_series(1, 20000, 1) as x) as d
) t;The
QUERY PLANS for all 3 queries are the same:Aggregate (cost=22.51..22.52 rows=1 width=8)
-> Result (cost=0.00..5.01 rows=1000 width=4)Can anyone explain why the first query is so slow - and maybe anyone knows a workaround?
More details
slow platforms
I get similar results on all of these (first query is a magnitude slower):
- SQL Fiddle uses pg 9.6
- my local PC with similar results: Win10 64bit, pg 11.5 running in Docker
- remote server: Ubuntu 18.04 64-bit running pg 11.5 in Docker
- rextester.com
- slow query ~ 3sec
- fast query ~ 0.5sec
count
When I change
count(value) to count(*) or count(1) (number one) the query is fast - but this does not help me because the production query does not even include a count
- anyway, I wonder why there is a difference in this case (there are no null-values in the
Solution
You're invoking two different log functions here:
Notice how the functions call differ in the EXPLAIN (ANALYZE, VERBOSE) below,
run with PostgreSQL 11.5 (Linux Ubuntu):
Slow version:
Fast version:
log(numeric,numeric) and log(double precision), and the first is much slower than the second.Notice how the functions call differ in the EXPLAIN (ANALYZE, VERBOSE) below,
run with PostgreSQL 11.5 (Linux Ubuntu):
Slow version:
explain (analyze, verbose) select count(value)
from (
SELECT log(greatest(1e-9, x)) as value
from (select generate_series(1, 20000, 1) as x) as d
) t;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Aggregate (cost=25.02..25.03 rows=1 width=8) (actual time=1174.349..1174.349 rows=1 loops=1)
Output: count(log('10'::numeric, GREATEST(0.000000001, ((generate_series(1, 20000, 1)))::numeric)))
-> ProjectSet (cost=0.00..5.02 rows=1000 width=4) (actual time=0.004..1.310 rows=20000 loops=1)
Output: generate_series(1, 20000, 1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
Planning Time: 0.123 ms
Execution Time: 1174.385 msFast version:
explain (analyze, verbose) select count(value)
from (
SELECT log(greatest(1e-9::float, x)) as value
from (select generate_series(1, 20000, 1) as x) as d
) t;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Aggregate (cost=25.02..25.03 rows=1 width=8) (actual time=6.693..6.693 rows=1 loops=1)
Output: count(log(GREATEST('1e-09'::double precision, ((generate_series(1, 20000, 1)))::double precision)))
-> ProjectSet (cost=0.00..5.02 rows=1000 width=4) (actual time=0.004..2.561 rows=20000 loops=1)
Output: generate_series(1, 20000, 1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
Planning Time: 0.096 ms
Execution Time: 6.731 msgreatest() is not responsible: considering the query with just log(x), if you cast x to numeric it will be as slow with or without greatest().Code Snippets
explain (analyze, verbose) select count(value)
from (
SELECT log(greatest(1e-9, x)) as value
from (select generate_series(1, 20000, 1) as x) as d
) t;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Aggregate (cost=25.02..25.03 rows=1 width=8) (actual time=1174.349..1174.349 rows=1 loops=1)
Output: count(log('10'::numeric, GREATEST(0.000000001, ((generate_series(1, 20000, 1)))::numeric)))
-> ProjectSet (cost=0.00..5.02 rows=1000 width=4) (actual time=0.004..1.310 rows=20000 loops=1)
Output: generate_series(1, 20000, 1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
Planning Time: 0.123 ms
Execution Time: 1174.385 msexplain (analyze, verbose) select count(value)
from (
SELECT log(greatest(1e-9::float, x)) as value
from (select generate_series(1, 20000, 1) as x) as d
) t;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Aggregate (cost=25.02..25.03 rows=1 width=8) (actual time=6.693..6.693 rows=1 loops=1)
Output: count(log(GREATEST('1e-09'::double precision, ((generate_series(1, 20000, 1)))::double precision)))
-> ProjectSet (cost=0.00..5.02 rows=1000 width=4) (actual time=0.004..2.561 rows=20000 loops=1)
Output: generate_series(1, 20000, 1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
Planning Time: 0.096 ms
Execution Time: 6.731 msContext
StackExchange Database Administrators Q#252046, answer score: 6
Revisions (0)
No revisions yet.