HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

Why is array_agg() slower than the non-aggregate ARRAY() constructor?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whyarray_aggthenonarraythanslowerconstructoraggregate

Problem

I was just reviewing some old code written for pre-8.4 PostgreSQL, and I saw something really nifty. I remember having a custom function do some of this back in the day, but I forgot what pre-array_agg() looked like. For review, modern aggregation is written like this.

SELECT array_agg(x ORDER BY x DESC) FROM foobar;


However, once upon a time, it was written like this,

SELECT ARRAY(SELECT x FROM foobar ORDER BY x DESC);


So, I tried it with some test data..

CREATE TEMP TABLE foobar AS
SELECT * FROM generate_series(1,1e7)
  AS t(x);


The results were surprising.. The #OldSchoolCool way was massively faster: a 25% speedup. Moreover, simplifying it without the ORDER, showed the same slowness.

# EXPLAIN ANALYZE SELECT ARRAY(SELECT x FROM foobar);
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Result  (cost=104425.28..104425.29 rows=1 width=0) (actual time=1665.948..1665.949 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Seq Scan on foobar  (cost=0.00..104425.28 rows=6017728 width=32) (actual time=0.032..716.793 rows=10000000 loops=1)
 Planning time: 0.068 ms
 Execution time: 1671.482 ms
(5 rows)

test=# EXPLAIN ANALYZE SELECT array_agg(x) FROM foobar;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=119469.60..119469.61 rows=1 width=32) (actual time=2155.154..2155.154 rows=1 loops=1)
   ->  Seq Scan on foobar  (cost=0.00..104425.28 rows=6017728 width=32) (actual time=0.031..717.831 rows=10000000 loops=1)
 Planning time: 0.054 ms
 Execution time: 2174.753 ms
(4 rows)


So, what's going on here. Why is array_agg, a

Solution

There is nothing "old school" or "outdated" about an ARRAY constructor (That's what ARRAY(SELECT x FROM foobar) is). It's modern as ever. Use it for simple array aggregation.

The manual:


It is also possible to construct an array from the results of a
subquery. In this form, the array constructor is written with the key
word ARRAY followed by a parenthesized (not bracketed) subquery.

The aggregate function array_agg() is more versatile in that it can be integrated in a SELECT list with more columns, possibly more aggregations in the same SELECT, and arbitrary groups can be formed with GROUP BY. While an ARRAY constructor can only return a single array from a SELECT returning a single column.

I did not study the source code, but it would seem obvious that a much more versatile tool is also more expensive.

One notable difference: the ARRAY constructor returns an empty array ({}) if no rows qualify. array_agg() returns NULL for the same.

Context

StackExchange Database Administrators Q#159710, answer score: 29

Revisions (0)

No revisions yet.