patternsqlMinor
Efficiently select beginning and end of multiple contiguous ranges in Postgresql query
Viewed 0 times
postgresqlefficientlycontiguousqueryrangesbeginningmultipleendandselect
Problem
I've got about a billion rows of data in a table with a name and an integer in the range 1-288. For a given name, every int is unique, and not every possible integer in the range is present--so there are gaps.
This query generates an example case:
I'd like to generate a lookup table with a row for each name and sequence of contiguous integers. Each such row would contain:
name -- the value of the name column
start -- the first integer in the contiguous sequence
end -- the final value in the contiguous sequence
span -- end - start + 1
This query generates example output for the above example:
Because I have so many rows, more efficient is better. That said, I only have to run this query once, so it isn't an absolute requirement.
Thanks in advance!
Edit:
I should add that PL/pgSQL solutions are welcome (please explain any Fancy Tricks--I'm still new to PL/pgSQL).
This query generates an example case:
--what I have:
SELECT *
FROM ( VALUES ('foo', 2),
('foo', 3),
('foo', 4),
('foo', 10),
('foo', 11),
('foo', 13),
('bar', 1),
('bar', 2),
('bar', 3)
) AS baz ("name", "int")I'd like to generate a lookup table with a row for each name and sequence of contiguous integers. Each such row would contain:
name -- the value of the name column
start -- the first integer in the contiguous sequence
end -- the final value in the contiguous sequence
span -- end - start + 1
This query generates example output for the above example:
--what I need:
SELECT *
FROM ( VALUES ('foo', 2, 4, 3),
('foo', 10, 11, 2),
('foo', 13, 13, 1),
('bar', 1, 3, 3)
) AS contiguous_ranges ("name", "start", "end", span)Because I have so many rows, more efficient is better. That said, I only have to run this query once, so it isn't an absolute requirement.
Thanks in advance!
Edit:
I should add that PL/pgSQL solutions are welcome (please explain any Fancy Tricks--I'm still new to PL/pgSQL).
Solution
How about using
test view:
query:
result:
I'd be interested to know how that performs on your billion row table.
with recursivetest view:
create view v as
select *
from ( values ('foo', 2),
('foo', 3),
('foo', 4),
('foo', 10),
('foo', 11),
('foo', 13),
('bar', 1),
('bar', 2),
('bar', 3)
) as baz ("name", "int");query:
with recursive t("name", "int") as ( select "name", "int", 1 as span from v
union all
select "name", v."int", t.span+1 as span
from v join t using ("name")
where v."int"=t."int"+1 )
select "name", "start", "start"+span-1 as "end", span
from( select "name", ("int"-span+1) as "start", max(span) as span
from ( select "name", "int", max(span) as span
from t
group by "name", "int" ) z
group by "name", ("int"-span+1) ) z;result:
name | start | end | span
------+-------+-----+------
foo | 2 | 4 | 3
foo | 13 | 13 | 1
bar | 1 | 3 | 3
foo | 10 | 11 | 2
(4 rows)I'd be interested to know how that performs on your billion row table.
Code Snippets
create view v as
select *
from ( values ('foo', 2),
('foo', 3),
('foo', 4),
('foo', 10),
('foo', 11),
('foo', 13),
('bar', 1),
('bar', 2),
('bar', 3)
) as baz ("name", "int");with recursive t("name", "int") as ( select "name", "int", 1 as span from v
union all
select "name", v."int", t.span+1 as span
from v join t using ("name")
where v."int"=t."int"+1 )
select "name", "start", "start"+span-1 as "end", span
from( select "name", ("int"-span+1) as "start", max(span) as span
from ( select "name", "int", max(span) as span
from t
group by "name", "int" ) z
group by "name", ("int"-span+1) ) z;name | start | end | span
------+-------+-----+------
foo | 2 | 4 | 3
foo | 13 | 13 | 1
bar | 1 | 3 | 3
foo | 10 | 11 | 2
(4 rows)Context
StackExchange Database Administrators Q#17045, answer score: 9
Revisions (0)
No revisions yet.