HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Group by range of years

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rangegroupyears

Problem

I've got a big table (~9M rows) and want to group the rows on a field containing the year. So far that's pretty easy:

// greatly simplified:
SELECT count(*), year FROM dataset GROUP BY year ORDER BY 2;


We defined some irregular time periods spanning multiple years:

2005


I've got no clue on how to group these results in the group by clause. I could make subquery's for every time period.

SELECT
  ( SELECT count(*) FROM dataset WHERE year <= 1945 AND ...... ) AS pre1945,
  ( ....) AS period2,
  ....
FROM dataset


But that feels not right and I'm wondering if it was possible to let Postgresql do it. Especially because the query is a strong simplification of the real query: it has multiple conditions, amongst them a ST_within clause spanning four tables. So choosing the subquery-approach results in a bloated query.

Is there a better way to create this result?

Solution

Use conditional counting:

select count(case when year <= 1945 then 1 end) as pre1945,
       count(case when year between 1946 and 1964 then 1 end) as period2,
       count(case when year between 1965 and 1974 then 1 end) as period3,
       ...
from ...
where ...;


This works because count() ignores null values and the case statement returns a null for values outside of the range it tests for (an else null is implicit).

With the upcoming 9.4 version you can re-write this as

select count(*) filter (where year <= 1945) as pre1945,
       count(*) filter (where year between 1946 and 1964) as period2,
       count(*) filter (where year between 1965 and 1974) as period3,
       ...
from ...
where ...;

Code Snippets

select count(case when year <= 1945 then 1 end) as pre1945,
       count(case when year between 1946 and 1964 then 1 end) as period2,
       count(case when year between 1965 and 1974 then 1 end) as period3,
       ...
from ...
where ...;
select count(*) filter (where year <= 1945) as pre1945,
       count(*) filter (where year between 1946 and 1964) as period2,
       count(*) filter (where year between 1965 and 1974) as period3,
       ...
from ...
where ...;

Context

StackExchange Database Administrators Q#86274, answer score: 8

Revisions (0)

No revisions yet.