HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Efficiently select beginning and end of multiple contiguous ranges in Postgresql query

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlefficientlycontiguousqueryrangesbeginningmultipleendandselect

Problem

I've got about a billion rows of data in a table with a name and an integer in the range 1-288. For a given name, every int is unique, and not every possible integer in the range is present--so there are gaps.

This query generates an example case:

--what I have:
SELECT *
FROM ( VALUES ('foo', 2),
              ('foo', 3),
              ('foo', 4),
              ('foo', 10),
              ('foo', 11),
              ('foo', 13),
              ('bar', 1),
              ('bar', 2),
              ('bar', 3)
     ) AS baz ("name", "int")


I'd like to generate a lookup table with a row for each name and sequence of contiguous integers. Each such row would contain:

name -- the value of the name column

start -- the first integer in the contiguous sequence

end -- the final value in the contiguous sequence

span -- end - start + 1

This query generates example output for the above example:

--what I need:
SELECT * 
FROM ( VALUES ('foo', 2, 4, 3),
              ('foo', 10, 11, 2),
              ('foo', 13, 13, 1),
              ('bar', 1, 3, 3)
     ) AS contiguous_ranges ("name", "start", "end", span)


Because I have so many rows, more efficient is better. That said, I only have to run this query once, so it isn't an absolute requirement.

Thanks in advance!

Edit:

I should add that PL/pgSQL solutions are welcome (please explain any Fancy Tricks--I'm still new to PL/pgSQL).

Solution

How about using with recursive

test view:

create view v as 
select *
from ( values ('foo', 2),
              ('foo', 3),
              ('foo', 4),
              ('foo', 10),
              ('foo', 11),
              ('foo', 13),
              ('bar', 1),
              ('bar', 2),
              ('bar', 3)
     ) as baz ("name", "int");


query:

with recursive t("name", "int") as ( select "name", "int", 1 as span from v
                                     union all
                                     select "name", v."int", t.span+1 as span
                                     from v join t using ("name")
                                     where v."int"=t."int"+1 )
select "name", "start", "start"+span-1 as "end", span
from( select "name", ("int"-span+1) as "start", max(span) as span
      from ( select "name", "int", max(span) as span 
             from t
             group by "name", "int" ) z
      group by "name", ("int"-span+1) ) z;


result:

name | start | end | span
------+-------+-----+------
 foo  |     2 |   4 |    3
 foo  |    13 |  13 |    1
 bar  |     1 |   3 |    3
 foo  |    10 |  11 |    2
(4 rows)


I'd be interested to know how that performs on your billion row table.

Code Snippets

create view v as 
select *
from ( values ('foo', 2),
              ('foo', 3),
              ('foo', 4),
              ('foo', 10),
              ('foo', 11),
              ('foo', 13),
              ('bar', 1),
              ('bar', 2),
              ('bar', 3)
     ) as baz ("name", "int");
with recursive t("name", "int") as ( select "name", "int", 1 as span from v
                                     union all
                                     select "name", v."int", t.span+1 as span
                                     from v join t using ("name")
                                     where v."int"=t."int"+1 )
select "name", "start", "start"+span-1 as "end", span
from( select "name", ("int"-span+1) as "start", max(span) as span
      from ( select "name", "int", max(span) as span 
             from t
             group by "name", "int" ) z
      group by "name", ("int"-span+1) ) z;
name | start | end | span
------+-------+-----+------
 foo  |     2 |   4 |    3
 foo  |    13 |  13 |    1
 bar  |     1 |   3 |    3
 foo  |    10 |  11 |    2
(4 rows)

Context

StackExchange Database Administrators Q#17045, answer score: 9

Revisions (0)

No revisions yet.