HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlMinor

What is the difference between select from function and select function

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thewhatfunctiondifferencebetweenandselectfrom

Problem

I am doing a test with generate_series function on Postgres v.9.6 on Ubuntu 16.04. So, I get a curiosity about loops (loops=100000) when calling that function as below.

EXPLAIN ANALYZE
  SELECT count(*)
  FROM t5
  WHERE id in (SELECT a FROM generate_series(1,10));
Aggregate  (cost=627262.00..627262.01 rows=1 width=8) (actual time=60.534..60.534 rows=1 loops=1)
  ->  Seq Scan on t5  (cost=0.00..627137.00 rows=50000 width=0) (actual time=0.048..50.267 rows=100000 loops=1)
        Filter: (SubPlan 1)
        SubPlan 1
          ->  Function Scan on generate_series  (cost=0.00..10.00 rows=1000 width=4) (actual time=0.000..0.000 rows=1 loops=100000)
Planning time: 0.161 ms
Execution time: 60.599 ms


However, If I change the query. The loop and result will change.

EXPLAIN ANALYZE
  SELECT count(*)
  FROM t5
  WHERE id in (SELECT generate_series(1,10) a);
Aggregate  (cost=1488.51..1488.52 rows=1 width=8) (actual time=0.050..0.050 rows=1 loops=1)
  ->  Nested Loop  (cost=17.80..1363.51 rows=50000 width=0) (actual time=0.025..0.046 rows=10 loops=1)
        ->  HashAggregate  (cost=17.51..19.51 rows=200 width=4) (actual time=0.012..0.014 rows=10 loops=1)
              Group Key: generate_series(1, 10)
              ->  Result  (cost=0.00..5.01 rows=1000 width=4) (actual time=0.004..0.007 rows=10 loops=1)
        ->  Index Only Scan using t5_pkey on t5  (cost=0.29..6.71 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=10)
              Index Cond: (id = (generate_series(1, 10)))
              Heap Fetches: 10
Planning time: 0.124 ms
Execution time: 0.086 ms


Here's script to create table

CREATE TABLE t5 ( id serial primary key, a int, b text);
INSERT INTO t5 (a,b) SELECT a, 'test_' || a FROM generate_series(1, 100000) a;

Solution

It's not the FROM that's causing the different query plans. It's the bad aliasing.

EXPLAIN ANALYZE
  SELECT count(*)
  FROM t5
  WHERE id in (SELECT generate_series(1,10));


And

EXPLAIN ANALYZE
  SELECT count(*)
  FROM t5
  WHERE id in (SELECT generate_series FROM generate_series(1,10));


will return the same plan. As will this,

EXPLAIN ANALYZE
  SELECT count(*)
  FROM t5
  WHERE id in (SELECT x FROM generate_series(1,10) AS gs(x));


However, this won't.

EXPLAIN ANALYZE
  SELECT count(*)
  FROM t5
  WHERE id in (SELECT a FROM generate_series(1,10));


Notice: you forget to alias the generate_series to a Now you're essentially running a correlated subquery. It's not just more work, but it's returning the entire table and not just 10 rows.

Code Snippets

EXPLAIN ANALYZE
  SELECT count(*)
  FROM t5
  WHERE id in (SELECT generate_series(1,10));
EXPLAIN ANALYZE
  SELECT count(*)
  FROM t5
  WHERE id in (SELECT generate_series FROM generate_series(1,10));
EXPLAIN ANALYZE
  SELECT count(*)
  FROM t5
  WHERE id in (SELECT x FROM generate_series(1,10) AS gs(x));
EXPLAIN ANALYZE
  SELECT count(*)
  FROM t5
  WHERE id in (SELECT a FROM generate_series(1,10));

Context

StackExchange Database Administrators Q#176519, answer score: 3

Revisions (0)

No revisions yet.