patternsqlMinor
PostgreSQL equivalent to Oracle's ANY_VALUE(...) KEEP (DENSE_RANK FIRST/LAST ORDER BY ...)
Viewed 0 times
postgresqllastorderequivalentkeepdense_rankany_valuefirstoracle
Problem
There's a technique in Oracle SQL that can be used to simplify aggregation queries:
Aggregate on a particular column, but get information from a different column, using a simple calculated column in the SELECT list.
db<>fiddle
As shown above, the following column can bring in the city name, even though the city name isn't in the GROUP BY:
There are a number of ways to achieve that kind of thing using SQL. I'm looking for a solution in PostgreSQL that lets me do it in a calculated column -- all within a single SELECT query (no subqueries, joins, WITH, etc.).
Question: Is there equivalent functionality to Oracle's
Related:
Edit:
I changed
Ties can be broken by adding
Aggregate on a particular column, but get information from a different column, using a simple calculated column in the SELECT list.
--Oracle
--For a given country, what city has the highest population? (where the country has more than one city)
--Include the city name as a column.
select
country,
count(*),
max(population),
any_value(city) keep (dense_rank first order by population desc) -- 1db<>fiddle
As shown above, the following column can bring in the city name, even though the city name isn't in the GROUP BY:
any_value(city) keep (dense_rank first order by population desc)There are a number of ways to achieve that kind of thing using SQL. I'm looking for a solution in PostgreSQL that lets me do it in a calculated column -- all within a single SELECT query (no subqueries, joins, WITH, etc.).
Question: Is there equivalent functionality to Oracle's
ANY_VALUE(...) KEEP (DENSE_RANK FIRST/LAST ORDER BY ...) in PostgreSQL?Related:
- YouTube: The KEEP clause will KEEP your SQL queries SIMPLE (Oracle)
- Stack Overflow: Explanation of KEEP in Oracle FIRST/LAST
- db-oriented.com: ANY_VALUE and FIRST/LAST (KEEP)
- DBA Stack Exchange: How to request an enhancement to PostgreSQL
Edit:
I changed
MAX() to ANY_VALUE(), since I think ANY_VALUE() is easier to read.Ties can be broken by adding
, city desc to the order by, making it deterministic:any_value(city) keep (dense_rank first order by population desc, city desc)Solution
first_last_aggThere is the additional module first_last_agg to make this simple.
It's available from apt.postgresql.org (among others). Read instructions in the Postgres Wiki. Install it once per database with:
CREATE EXTENSION first_last_agg;It provides two aggregate functions:
first() and last().Most hosted services don't provide the module. If you can't install it, the next best option is to create aggregate functions yourself like demonstrated in the Postgres Wiki, and also in my fiddle below. Or here:
- Fetch a row that contains the set of last non-NULL values for each column
But the C implementation of the module first_last_agg is faster.
Then:
SELECT country
, count(*) AS ct_cities
, max(population) AS highest_population
, last(city ORDER BY population, city) AS biggest_city -- !
FROM cities
GROUP BY country
HAVING count(*) > 1;fiddle
Same as:
, first(city ORDER BY population DESC NULLS LAST, city DESC NULLS LAST) AS biggest_cityWhy
NULLS LAST? See:- Sort by column ASC, but NULL values first?
Either reports the city with the highest population and the name sorting alphabetically last - like your original.
Without additional module
If you cannot install additional modules. And you still insist on:
all within a single SELECT query (no subqueries, joins, WITH, etc.).
DISTINCT ON in combination with window functions does it, too:SELECT DISTINCT ON (country)
country
, count(*) OVER (PARTITION BY country) AS ct_cities
, population AS highest_population
, city AS biggest_city
FROM cities c
ORDER BY country, population DESC NULLS LAST, city DESC NULLS LAST;See:
- Calculating follower growth over time for each influencer
To also eliminate countries with only a single entry:
SELECT DISTINCT ON (country)
country
, count(*) OVER (PARTITION BY country) AS ct_cities
, population AS highest_population
, city AS biggest_city
FROM cities c
WHERE EXISTS (SELECT FROM cities c1 WHERE c1.country = c.country AND c1.ctid <> c.ctid)
ORDER BY country, population DESC NULLS LAST, city DESC NULLS LAST;Use your PK instead of
ctid if you have one. See:- Is the system column "ctid" legitimate for identifying rows to delete?
If a subquery is allowed, then rather:
SELECT *
FROM (
SELECT DISTINCT ON (country)
country
, count(*) OVER (PARTITION BY country) AS ct_cities
, population AS highest_population
, city AS biggest_city
FROM cities c
ORDER BY country, population DESC NULLS LAST, city DESC NULLS LAST
) sub
WHERE ct_cities > 1;(array_agg(city ORDER BY population DESC NULLS LAST))[1] typically performs poorly for more than a few rows per country. Aggregating big arrays, only to get the first element is expensive. See performance benchmark:- Select first row in each GROUP BY group?
Code Snippets
CREATE EXTENSION first_last_agg;SELECT country
, count(*) AS ct_cities
, max(population) AS highest_population
, last(city ORDER BY population, city) AS biggest_city -- !
FROM cities
GROUP BY country
HAVING count(*) > 1;, first(city ORDER BY population DESC NULLS LAST, city DESC NULLS LAST) AS biggest_citySELECT DISTINCT ON (country)
country
, count(*) OVER (PARTITION BY country) AS ct_cities
, population AS highest_population
, city AS biggest_city
FROM cities c
ORDER BY country, population DESC NULLS LAST, city DESC NULLS LAST;SELECT DISTINCT ON (country)
country
, count(*) OVER (PARTITION BY country) AS ct_cities
, population AS highest_population
, city AS biggest_city
FROM cities c
WHERE EXISTS (SELECT FROM cities c1 WHERE c1.country = c.country AND c1.ctid <> c.ctid)
ORDER BY country, population DESC NULLS LAST, city DESC NULLS LAST;Context
StackExchange Database Administrators Q#324426, answer score: 7
Revisions (0)
No revisions yet.