HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Managing the order of aggregation in Postgres

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
themanagingorderpostgresaggregation

Problem

Suppose I have the following merging function that merges two jsonb values overwriting duplicate keys with the values from the second
create or replace function jsonb_concat(a jsonb, b jsonb) returns jsonb
as
'select $1 || $2'
language sql
immutable
parallel safe
;


Since I want to use that function in aggregates I need to define an aggregate function like so
create or replace aggregate jsonb_merge_agg(jsonb)
(
sfunc = jsonb_concat,
stype = jsonb,
initcond = '{}'
);


And suppose I have the following table

id (bigint)
username (text)
event (jsonb)

1
foo
{ "it": 1, "key": "bla" }

2
foo
{ "it": 2, "key" : "dah" }

3
bar
{}

4
zar
{}

When I want to aggregate on username column and merge the event column I would use the following query
select username, jsonb_merge_agg(event) from table group by username


And I expect to get the following results, where records with greater id overwrite keys value pairs of earlier records in the aggregate

username
event

foo
{ "it": 2, "key": "dah"}

bar
{}

zar
{}

The problem is that I am sometimes seeing that the foo aggregate contains combination { "it": 1, "key": "bla" } instead. I am aware that Postgres does not have natural row order in the table, so it might be different between transactions. How do I manage the order of the aggregation merge?

Solution

In the Aggregate functions page (chapter 9, section 20 in postgresql 12 documentation "Aggregate Functions" (1)) there's a paragraph specifically addressing this

The aggregate functions array_agg, json_agg, jsonb_agg, json_object_agg, jsonb_object_agg, string_agg, and xmlagg, as well as similar user-defined aggregate functions, produce meaningfully different result values depending on the order of the input values. This ordering is unspecified by default, but can be controlled by writing an ORDER BY clause within the aggregate call, as shown in Section 4.2.7.

And in the section 4.2.7 (2) Value expressions/Aggregate expressions the syntax is explained as following

aggregate_name (expression [ , ... ] [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]


as a result, to get consistent merge order the query must be modified to the following

select username, jsonb_merge_agg(event order by id)


1: https://www.postgresql.org/docs/12/functions-aggregate.html

2: https://www.postgresql.org/docs/12/sql-expressions.html#SYNTAX-AGGREGATES

Code Snippets

aggregate_name (expression [ , ... ] [ order_by_clause ] ) [ FILTER ( WHERE filter_clause ) ]
select username, jsonb_merge_agg(event order by id)

Context

StackExchange Database Administrators Q#334945, answer score: 2

Revisions (0)

No revisions yet.