HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Why is log(greatest()) so slow?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
greatestwhyslowlog

Problem

We have some complex query that was very slow. I managed to reduce the query to a simple reproduction. It seems that the combination of greatest and log is the cause, but I don't understand why.

Here is a full sql-fiddle example to run the queries - and you can also View the execution Plans of the queries (press the link at the bottom of the query result on the sql-fiddle page)

So here is the slow query:

select count(value)
from (
         SELECT  log(greatest(1e-9, x)) as value
         from (select generate_series(1, 20000, 1) as x) as d
     ) t;


We just generate a series of 20k numbers and use log(greatest()). This query takes about 1.5 seconds.

I thought that calculating the log may take long, but the following query is also fast (~5ms):

select count(value)
from (
         SELECT  log(x) as value
         from (select generate_series(1, 20000, 1) as x) as d
     ) t;


Just as a test I exchanged greatest and log - this is also fast (~5ms):

select count(value)
from (
         SELECT  greatest(1e-9, log(x)) as value
         from (select generate_series(1, 20000, 1) as x) as d
     ) t;


The QUERY PLANS for all 3 queries are the same:

Aggregate (cost=22.51..22.52 rows=1 width=8)
-> Result (cost=0.00..5.01 rows=1000 width=4)


Can anyone explain why the first query is so slow - and maybe anyone knows a workaround?

More details

slow platforms

I get similar results on all of these (first query is a magnitude slower):

  • SQL Fiddle uses pg 9.6



  • my local PC with similar results: Win10 64bit, pg 11.5 running in Docker



  • remote server: Ubuntu 18.04 64-bit running pg 11.5 in Docker



  • rextester.com



  • slow query ~ 3sec



  • fast query ~ 0.5sec



count

When I change count(value) to count(*) or count(1) (number one) the query is fast

  • but this does not help me because the production query does not even include a count



  • anyway, I wonder why there is a difference in this case (there are no null-values in the

Solution

You're invoking two different log functions here: log(numeric,numeric) and log(double precision), and the first is much slower than the second.

Notice how the functions call differ in the EXPLAIN (ANALYZE, VERBOSE) below,
run with PostgreSQL 11.5 (Linux Ubuntu):

Slow version:

explain (analyze, verbose) select count(value)
from (
         SELECT  log(greatest(1e-9, x)) as value
         from (select generate_series(1, 20000, 1) as x) as d
     ) t;
                                              QUERY PLAN                                               
-------------------------------------------------------------------------------------------------------
 Aggregate  (cost=25.02..25.03 rows=1 width=8) (actual time=1174.349..1174.349 rows=1 loops=1)
   Output: count(log('10'::numeric, GREATEST(0.000000001, ((generate_series(1, 20000, 1)))::numeric)))
   ->  ProjectSet  (cost=0.00..5.02 rows=1000 width=4) (actual time=0.004..1.310 rows=20000 loops=1)
         Output: generate_series(1, 20000, 1)
         ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
 Planning Time: 0.123 ms
 Execution Time: 1174.385 ms


Fast version:

explain (analyze, verbose) select count(value)
from (
         SELECT  log(greatest(1e-9::float, x)) as value
         from (select generate_series(1, 20000, 1) as x) as d
     ) t;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=25.02..25.03 rows=1 width=8) (actual time=6.693..6.693 rows=1 loops=1)
   Output: count(log(GREATEST('1e-09'::double precision, ((generate_series(1, 20000, 1)))::double precision)))
   ->  ProjectSet  (cost=0.00..5.02 rows=1000 width=4) (actual time=0.004..2.561 rows=20000 loops=1)
         Output: generate_series(1, 20000, 1)
         ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
 Planning Time: 0.096 ms
 Execution Time: 6.731 ms


greatest() is not responsible: considering the query with just log(x), if you cast x to numeric it will be as slow with or without greatest().

Code Snippets

explain (analyze, verbose) select count(value)
from (
         SELECT  log(greatest(1e-9, x)) as value
         from (select generate_series(1, 20000, 1) as x) as d
     ) t;
                                              QUERY PLAN                                               
-------------------------------------------------------------------------------------------------------
 Aggregate  (cost=25.02..25.03 rows=1 width=8) (actual time=1174.349..1174.349 rows=1 loops=1)
   Output: count(log('10'::numeric, GREATEST(0.000000001, ((generate_series(1, 20000, 1)))::numeric)))
   ->  ProjectSet  (cost=0.00..5.02 rows=1000 width=4) (actual time=0.004..1.310 rows=20000 loops=1)
         Output: generate_series(1, 20000, 1)
         ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
 Planning Time: 0.123 ms
 Execution Time: 1174.385 ms
explain (analyze, verbose) select count(value)
from (
         SELECT  log(greatest(1e-9::float, x)) as value
         from (select generate_series(1, 20000, 1) as x) as d
     ) t;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=25.02..25.03 rows=1 width=8) (actual time=6.693..6.693 rows=1 loops=1)
   Output: count(log(GREATEST('1e-09'::double precision, ((generate_series(1, 20000, 1)))::double precision)))
   ->  ProjectSet  (cost=0.00..5.02 rows=1000 width=4) (actual time=0.004..2.561 rows=20000 loops=1)
         Output: generate_series(1, 20000, 1)
         ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
 Planning Time: 0.096 ms
 Execution Time: 6.731 ms

Context

StackExchange Database Administrators Q#252046, answer score: 6

Revisions (0)

No revisions yet.