HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Consolidate list of ranges that overlap

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
consolidaterangesoverlapthatlist

Problem

I wanted to implemented an algorithm in Python 2.4 (so the odd construction for the conditional assignment) where given a list of ranges, the function returns another list with all ranges that overlap in the same entry.

I would appreciate other options (more elegant or efficient) than the one I took.

ranges = [(1,4),(1,9),(3,7),(10,15),(1,2),(8,17),(16,20),(100,200)]

def remove_overlap(rs):
        rs.sort()
        def process (rs,deviation):
                start = len(rs) - deviation - 1
                if start  rs[start-1][1]:
                        return process(rs,deviation+1)
                else:
                        rs[start-1] = ((rs[start][0],rs[start-1][0])[rs[start-1][0]  rs[start][1]])
                        del rs[start]
                        return process(rs,0)
        return process(rs,0)

print remove_overlap(ranges)

Solution

In addition to Winston Ewert's comments, I'd add:

-
There's no docstring. How are users to know how what this function does and how to call it?

-
This is an ideal function to give a couple of doctests to show how to use it and to test that it works.

-
The name remove_overlap could be improved. Remove overlap from what? And remove it how? And anyway, you don't just want to merge overlapping ranges (like 1–3 and 2–4), you want to merge adjacent ranges too (like 1–2 and 2–3). So I'd use merge_ranges.

-
It's simpler and more flexible to implement the function as a generator, rather than repeatedly appending to a list.

-
Winston's implementation doesn't work if any of the ranges are negative.

So I would write:

def merge_ranges(ranges):
    """
    Merge overlapping and adjacent ranges and yield the merged ranges
    in order. The argument must be an iterable of pairs (start, stop).

    >>> list(merge_ranges([(5,7), (3,5), (-1,3)]))
    [(-1, 7)]
    >>> list(merge_ranges([(5,6), (3,4), (1,2)]))
    [(1, 2), (3, 4), (5, 6)]
    >>> list(merge_ranges([]))
    []
    """
    ranges = iter(sorted(ranges))
    current_start, current_stop = next(ranges)
    for start, stop in ranges:
        if start > current_stop:
            # Gap between segments: output current segment and start a new one.
            yield current_start, current_stop
            current_start, current_stop = start, stop
        else:
            # Segments adjacent or overlapping: merge.
            current_stop = max(current_stop, stop)
    yield current_start, current_stop


Update

Winston Ewert notes in comments that it's not exactly obvious how this works in the case when ranges is the empty list: in particular, the call next(ranges) looks suspicious.

The explanation is that when ranges is empty, next(ranges) raises the exception StopIteration. And that's exactly what we want, because we are writing a generator function and raising StopIteration is one of the ways that a generator can signal that it is finished.

This is a common pattern when building one iterator from another: the outer iterator keeps reading elements from the inner iterator, relying on the inner iterator to raise StopIteration when it is empty. Several of the recipes in the itertools documentation use this pattern, for example imap and islice.

Supposing that you think this is a bit ugly, and you wanted to make the behaviour explicit, what would you write? Well, you'd end up writing something like this:

try:
    current_start, current_stop = next(ranges)
except StopIteration:     # ranges is empty
    raise StopIteration   # and so are we


I hope you can see now why I didn't write it like that! I prefer to follow the maxim, "program as if you know the language."

Update 2

The idiom of deferring from one generator to another via next will no longer work in Python 3.6 (see PEP 479), so for future compatibility the code needs to read:

try:
    current_start, current_stop = next(ranges)
except StopIteration:
    return

Code Snippets

def merge_ranges(ranges):
    """
    Merge overlapping and adjacent ranges and yield the merged ranges
    in order. The argument must be an iterable of pairs (start, stop).

    >>> list(merge_ranges([(5,7), (3,5), (-1,3)]))
    [(-1, 7)]
    >>> list(merge_ranges([(5,6), (3,4), (1,2)]))
    [(1, 2), (3, 4), (5, 6)]
    >>> list(merge_ranges([]))
    []
    """
    ranges = iter(sorted(ranges))
    current_start, current_stop = next(ranges)
    for start, stop in ranges:
        if start > current_stop:
            # Gap between segments: output current segment and start a new one.
            yield current_start, current_stop
            current_start, current_stop = start, stop
        else:
            # Segments adjacent or overlapping: merge.
            current_stop = max(current_stop, stop)
    yield current_start, current_stop
try:
    current_start, current_stop = next(ranges)
except StopIteration:     # ranges is empty
    raise StopIteration   # and so are we
try:
    current_start, current_stop = next(ranges)
except StopIteration:
    return

Context

StackExchange Code Review Q#21307, answer score: 9

Revisions (0)

No revisions yet.