HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMajor

Merging Overlapping Intervals

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
intervalsoverlappingmerging

Problem

A while back I was asked the following question in a phone interview, and to be honest it stumped me. After many long nights of rolling the problem around in my head I think I came up with a decent solution.

The Question:


Given a list of intervals like:


[(5, 7), (11, 116), (3, 6), (10, 12), (6, 12)]


Write a function that will merge overlapping intervals.

My solution:

def merge_intervals(intervals):
"""
A simple algorithm can be used:
1. Sort the intervals in increasing order
2. Push the first interval on the stack
3. Iterate through intervals and for each one compare current interval
with the top of the stack and:
A. If current interval does not overlap, push on to stack
B. If current interval does overlap, merge both intervals in to one
and push on to stack
4. At the end return stack
"""
si = sorted(intervals, key=lambda tup: tup[0])
merged = []

for tup in si:
if not merged:
merged.append(tup)
else:
b = merged.pop()
if b[1] >= tup[0]:
new_tup = tuple([b[0], tup[1]])
merged.append(new_tup)
else:
merged.append(b)
merged.append(tup)
return merged

if __name__ == '__main__':

l = [(5, 7), (11, 116), (3, 4), (10, 12), (6, 12)]
print("Original list of ranges: {}".format(l))
merged_list = merge_intervals(l)
print("List of ranges after merge_ranges: {}".format(merged_list))


My questions are:

  • Run time: Without counting sorting I believe this runs in \$O(n)\$. With counting sorting that would go up to \$O(n*\log(n))\$. Is this correct?



  • Stylistically does anything need to be improved?



  • Is there a more efficient way of solving this problem?

Solution

This code has a bug. Consider merge_intervals([(0, 3), (1, 2)]). This returns [(0, 2)] although it should be [(0, 3)]. This can be solved through

new_tup = (b[0], max(b[1], tup[1]))


Your code is nice, and implements the simplest thing that works. However, you might want to work on improving your variable names. Neither l, b, or si are very descriptive, and tup and new_tup unnecessarily refer to their type name rather than their meaning. In contrast, merged is a very clear name. A rewrite of the function body might be:

sorted_by_lower_bound = sorted(intervals, key=lambda tup: tup[0])
merged = []

for higher in sorted_by_lower_bound:
    if not merged:
        merged.append(higher)
    else:
        lower = merged[-1]
        # test for intersection between lower and higher:
        # we know via sorting that lower[0] <= higher[0]
        if higher[0] <= lower[1]:
            upper_bound = max(lower[1], higher[1])
            merged[-1] = (lower[0], upper_bound)  # replace by merged interval
        else:
            merged.append(higher)
return merged


Notice that I decided not to use an explicit tuple(…) constructor – that's mostly useful for converting to tuples from iterables. For tuple literals, you can use the (…, …) syntax.

I also put the comments inside the code, rather than into the docstring. Docstrings are good, but they are intended to explain how to use the function (e.g. what the expected type of parameters is, what the function returns, …). To document implementation details, use normal comments.

You have correctly identified the (average and worst-case) algorithmic complexity as \$O(n \log n)\$. You must consider the sorting. However, best-case complexity is \$O(n)\$, as the Tim Sort used by Python detects already-sorted input.

While your code provides a simple and robust solution, it would be certainly possible to reduce average algorithmic complexity. I expect a kind of self-merging interval tree to have average complexity closer to \$O(n \log m)\$ where \$m\$ is the number of intervals after merging.

Code Snippets

new_tup = (b[0], max(b[1], tup[1]))
sorted_by_lower_bound = sorted(intervals, key=lambda tup: tup[0])
merged = []

for higher in sorted_by_lower_bound:
    if not merged:
        merged.append(higher)
    else:
        lower = merged[-1]
        # test for intersection between lower and higher:
        # we know via sorting that lower[0] <= higher[0]
        if higher[0] <= lower[1]:
            upper_bound = max(lower[1], higher[1])
            merged[-1] = (lower[0], upper_bound)  # replace by merged interval
        else:
            merged.append(higher)
return merged

Context

StackExchange Code Review Q#69242, answer score: 23

Revisions (0)

No revisions yet.