HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

In PostgreSQL, is there a type-safe first() aggregate function?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlfunctiontypefirstsafethereaggregate

Problem

I'm looking for a first() aggregate function.

Here I found something that almost works:

CREATE OR REPLACE FUNCTION public.first_agg (anyelement, anyelement)
RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS $
    SELECT $1;
$;
 
-- And then wrap an aggregate around it
CREATE AGGREGATE public.first (
    sfunc    = public.first_agg,
    basetype = anyelement,
    stype    = anyelement
);


The problem is that when a varchar(n) column passes through the first() function, it's converted into simple varchar (without size modifier). Trying to return the query in a function as RETURNS SETOF anyelement, I get the following error:
ERROR: structure of query does not match function result type Estado de
SQL:42804
Detalhe:Returned type character varying does not match expected type character varying(40) in column 2.
Contexto:PL/pgSQL function vsr_table_at_time(anyelement,timestamp without time zone) line 31 at RETURN QUERY


In the same wiki page there is a link to a C Version of the function that would replace the above. I don't know how to install it, but I wonder if this version could solve my problem.

Meanwhile, is there a way I can change the above function so it returns the exact same type of the input column?

Solution

DISTINCT ON()

Just as a side note, this is precisely what DISTINCT ON() does (not to be confused with DISTINCT)


SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example

So if you were to write,

SELECT myFirstAgg(z)
FROM foo
GROUP BY x,y;


It's effectively

SELECT DISTINCT ON(x,y) z
FROM foo;
-- ORDER BY z;


In that it takes the first z. There are two important differences,

-
You can also select other columns at no cost of further aggregation..

SELECT DISTINCT ON(x,y) z, k, r, t, v
FROM foo;
-- ORDER BY z, k, r, t, v;


-
Because there is no GROUP BY you can not use (real) aggregates with it.

CREATE TABLE foo AS
SELECT * FROM ( VALUES
  (1,2,3),
  (1,2,4),
  (1,2,5)
) AS t(x,y,z);

SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;

-- fails, as you should expect.
SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;

-- would not otherwise fail.
SELECT myFirstAgg(z), sum(z)
FROM foo
GROUP BY x,y;


Don't forget ORDER BY

Also, while I didn't bold it then I will now


Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example

Always use an ORDER BY with DISTINCT ON

Using an Ordered-Set Aggregate Function

I imagine a lot of people are looking for first_value, Ordered-Set Aggregate Functions. Just wanted to throw that out there. It would look like this, if the function existed:

SELECT a, b, first_value() WITHIN GROUP (ORDER BY z)    
FROM foo
GROUP BY a,b;


But, alas you can do this.

SELECT a, b, percentile_disc(0) WITHIN GROUP (ORDER BY z)   
FROM foo
GROUP BY a,b;

Code Snippets

SELECT myFirstAgg(z)
FROM foo
GROUP BY x,y;
SELECT DISTINCT ON(x,y) z
FROM foo;
-- ORDER BY z;
SELECT DISTINCT ON(x,y) z, k, r, t, v
FROM foo;
-- ORDER BY z, k, r, t, v;
CREATE TABLE foo AS
SELECT * FROM ( VALUES
  (1,2,3),
  (1,2,4),
  (1,2,5)
) AS t(x,y,z);

SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;

-- fails, as you should expect.
SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;

-- would not otherwise fail.
SELECT myFirstAgg(z), sum(z)
FROM foo
GROUP BY x,y;
SELECT a, b, first_value() WITHIN GROUP (ORDER BY z)    
FROM foo
GROUP BY a,b;

Context

StackExchange Database Administrators Q#63661, answer score: 45

Revisions (0)

No revisions yet.