patternsqlModerate
best way to avoid redundant aggregate functions and/or group by columns
Viewed 0 times
groupcolumnswayavoidredundantandfunctionsaggregatebest
Problem
Suppose I have two tables:
Foo:
Bar:
So a Foo has many Bars. I frequently find myself in situations where I need to compute an aggregate across the Bars for a given set of Foos but I also want some properties from Foo. The two most straightforward ways of doing this are ugly:
Method #1: Unnecessary Aggregate Functions
Method #2: Unnecessary Group By Column
This is not so bad when there's only one additional column from Foo besides "id", but if there are many columns that need to be included then the grouping becomes much less efficient. A query like this gets around both these issues, but seems unwieldy:
Is there a better way? Platform is Postgres if that matters.
Foo:
id
bazBar:
id
foo_id
boomSo a Foo has many Bars. I frequently find myself in situations where I need to compute an aggregate across the Bars for a given set of Foos but I also want some properties from Foo. The two most straightforward ways of doing this are ugly:
Method #1: Unnecessary Aggregate Functions
select
foo.id,
min(foo.baz) as baz,
min(bar.boom) as min_boom
from
foo
join
bar on foo.id = bar.foo_id
group by
foo.id;Method #2: Unnecessary Group By Column
select
foo.id,
foo.baz,
min(bar.boom) as min_boom
from
foo
join
bar on foo.id = bar.foo_id
group by
foo.id,
foo.baz;This is not so bad when there's only one additional column from Foo besides "id", but if there are many columns that need to be included then the grouping becomes much less efficient. A query like this gets around both these issues, but seems unwieldy:
select
foo.id,
foo.baz,
x.min_boom
from
foo
join
(select
foo_id,
min(boom) as min_boom
from
bar
group by
foo_id) x on x.foo_id = foo.id;Is there a better way? Platform is Postgres if that matters.
Solution
If id is defined as the primary key, you can omit grouping by all the foo columns you want for the output as long as you are grouping by the id. This special case of grouping is in accordance with the current SQL standard and has also been covered in the PostgreSQL manual, starting from version 9.1:
When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.
(Emphasis added.)
So, if foo.id is the PK, this query would be valid:
When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.
(Emphasis added.)
So, if foo.id is the PK, this query would be valid:
select
foo.id,
foo.baz,
foo.whatever,
min(bar.boom) as min_boom
from
foo
join
bar on foo.id = bar.foo_id
group by
foo.id;Code Snippets
select
foo.id,
foo.baz,
foo.whatever,
min(bar.boom) as min_boom
from
foo
join
bar on foo.id = bar.foo_id
group by
foo.id;Context
StackExchange Database Administrators Q#138695, answer score: 12
Revisions (0)
No revisions yet.