patternpythonMinor
Consolidate list of ranges that overlap
Viewed 0 times
consolidaterangesoverlapthatlist
Problem
I wanted to implemented an algorithm in Python 2.4 (so the odd construction for the conditional assignment) where given a list of ranges, the function returns another list with all ranges that overlap in the same entry.
I would appreciate other options (more elegant or efficient) than the one I took.
I would appreciate other options (more elegant or efficient) than the one I took.
ranges = [(1,4),(1,9),(3,7),(10,15),(1,2),(8,17),(16,20),(100,200)]
def remove_overlap(rs):
rs.sort()
def process (rs,deviation):
start = len(rs) - deviation - 1
if start rs[start-1][1]:
return process(rs,deviation+1)
else:
rs[start-1] = ((rs[start][0],rs[start-1][0])[rs[start-1][0] rs[start][1]])
del rs[start]
return process(rs,0)
return process(rs,0)
print remove_overlap(ranges)Solution
In addition to Winston Ewert's comments, I'd add:
-
There's no docstring. How are users to know how what this function does and how to call it?
-
This is an ideal function to give a couple of doctests to show how to use it and to test that it works.
-
The name
-
It's simpler and more flexible to implement the function as a generator, rather than repeatedly appending to a list.
-
Winston's implementation doesn't work if any of the ranges are negative.
So I would write:
Update
Winston Ewert notes in comments that it's not exactly obvious how this works in the case when
The explanation is that when
This is a common pattern when building one iterator from another: the outer iterator keeps reading elements from the inner iterator, relying on the inner iterator to raise
Supposing that you think this is a bit ugly, and you wanted to make the behaviour explicit, what would you write? Well, you'd end up writing something like this:
I hope you can see now why I didn't write it like that! I prefer to follow the maxim, "program as if you know the language."
Update 2
The idiom of deferring from one generator to another via
-
There's no docstring. How are users to know how what this function does and how to call it?
-
This is an ideal function to give a couple of doctests to show how to use it and to test that it works.
-
The name
remove_overlap could be improved. Remove overlap from what? And remove it how? And anyway, you don't just want to merge overlapping ranges (like 1–3 and 2–4), you want to merge adjacent ranges too (like 1–2 and 2–3). So I'd use merge_ranges.-
It's simpler and more flexible to implement the function as a generator, rather than repeatedly appending to a list.
-
Winston's implementation doesn't work if any of the ranges are negative.
So I would write:
def merge_ranges(ranges):
"""
Merge overlapping and adjacent ranges and yield the merged ranges
in order. The argument must be an iterable of pairs (start, stop).
>>> list(merge_ranges([(5,7), (3,5), (-1,3)]))
[(-1, 7)]
>>> list(merge_ranges([(5,6), (3,4), (1,2)]))
[(1, 2), (3, 4), (5, 6)]
>>> list(merge_ranges([]))
[]
"""
ranges = iter(sorted(ranges))
current_start, current_stop = next(ranges)
for start, stop in ranges:
if start > current_stop:
# Gap between segments: output current segment and start a new one.
yield current_start, current_stop
current_start, current_stop = start, stop
else:
# Segments adjacent or overlapping: merge.
current_stop = max(current_stop, stop)
yield current_start, current_stopUpdate
Winston Ewert notes in comments that it's not exactly obvious how this works in the case when
ranges is the empty list: in particular, the call next(ranges) looks suspicious.The explanation is that when
ranges is empty, next(ranges) raises the exception StopIteration. And that's exactly what we want, because we are writing a generator function and raising StopIteration is one of the ways that a generator can signal that it is finished.This is a common pattern when building one iterator from another: the outer iterator keeps reading elements from the inner iterator, relying on the inner iterator to raise
StopIteration when it is empty. Several of the recipes in the itertools documentation use this pattern, for example imap and islice.Supposing that you think this is a bit ugly, and you wanted to make the behaviour explicit, what would you write? Well, you'd end up writing something like this:
try:
current_start, current_stop = next(ranges)
except StopIteration: # ranges is empty
raise StopIteration # and so are weI hope you can see now why I didn't write it like that! I prefer to follow the maxim, "program as if you know the language."
Update 2
The idiom of deferring from one generator to another via
next will no longer work in Python 3.6 (see PEP 479), so for future compatibility the code needs to read:try:
current_start, current_stop = next(ranges)
except StopIteration:
returnCode Snippets
def merge_ranges(ranges):
"""
Merge overlapping and adjacent ranges and yield the merged ranges
in order. The argument must be an iterable of pairs (start, stop).
>>> list(merge_ranges([(5,7), (3,5), (-1,3)]))
[(-1, 7)]
>>> list(merge_ranges([(5,6), (3,4), (1,2)]))
[(1, 2), (3, 4), (5, 6)]
>>> list(merge_ranges([]))
[]
"""
ranges = iter(sorted(ranges))
current_start, current_stop = next(ranges)
for start, stop in ranges:
if start > current_stop:
# Gap between segments: output current segment and start a new one.
yield current_start, current_stop
current_start, current_stop = start, stop
else:
# Segments adjacent or overlapping: merge.
current_stop = max(current_stop, stop)
yield current_start, current_stoptry:
current_start, current_stop = next(ranges)
except StopIteration: # ranges is empty
raise StopIteration # and so are wetry:
current_start, current_stop = next(ranges)
except StopIteration:
returnContext
StackExchange Code Review Q#21307, answer score: 9
Revisions (0)
No revisions yet.