HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Grouping categorical values in a sequence

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
categoricalvaluessequencegrouping

Problem

I have a table with categorical values in a sequence, as such:

CREATE TABLE cat (name, v1, v2)
AS VALUES
    ('John', 1::int, 3::int),
    ('John', 3, 4),
    ('John', 4, 9),
    ('Mike', 9, 11),
    ('Mike', 11, 17),
    ('John', 17, 24),
    ('John', 24, 25),
    ('Dave', 25, 29);


And I want to group them by the name column, getting the minimum value_1 and maximum value_2, however group them as the names appear in unbroken sequence, as such:

+-----------+---------+---------+
| name      | value_1 | value_2 |
+-----------+---------+---------+
| John      | 1       | 9       |
| Mike      | 9       | 17      |
| John      | 17      | 25      |
| Dave      | 25      | 29      |
+-----------+---------+---------+


However, doing normal GROUP BY selection returns all the entries for a particular name together, regardless of whether they appear together or not:

SELECT name, MIN(value_1), MAX(value_2)
FROM table
GROUP BY name;

+-----------+---------+---------+
| name      | value_1 | value_2 |
+-----------+---------+---------+
| John      | 1       | 25      |
| Mike      | 9       | 17      |
| Dave      | 25      | 29      |
+-----------+---------+---------+


Now, I could probably do this with plpgsql, using traditional programming logic with variables, loops and such, but I was wondering if there is a more SQL-flavoured approach to this.

Software: PostgreSQL 9.6.3

Solution

Using value_1 to order the row, you can make groups every time name changes.

Then use this groups to get max and min values.

select name, min(v1) v1, max(v2) v2
from (select name, v1, v2,
             sum(rst) over (order by v1) grp
      from (select name, v1, v2,
                   case when coalesce(lag(name) over (order by v1), '') <> name then 1 end rst
            from cat
            ) y
      ) z
group by name, grp
order by v1;


name | v1 | v2
:--- | -: | -:
John | 1 | 9
Mike | 9 | 17
John | 17 | 25
Dave | 25 | 29

dbfiddle here

Code Snippets

select name, min(v1) v1, max(v2) v2
from (select name, v1, v2,
             sum(rst) over (order by v1) grp
      from (select name, v1, v2,
                   case when coalesce(lag(name) over (order by v1), '') <> name then 1 end rst
            from cat
            ) y
      ) z
group by name, grp
order by v1;

Context

StackExchange Database Administrators Q#197111, answer score: 4

Revisions (0)

No revisions yet.