patternpythonMajor
Merging Overlapping Intervals
Viewed 0 times
intervalsoverlappingmerging
Problem
A while back I was asked the following question in a phone interview, and to be honest it stumped me. After many long nights of rolling the problem around in my head I think I came up with a decent solution.
The Question:
Given a list of intervals like:
Write a function that will merge overlapping intervals.
My solution:
My questions are:
The Question:
Given a list of intervals like:
[(5, 7), (11, 116), (3, 6), (10, 12), (6, 12)]Write a function that will merge overlapping intervals.
My solution:
def merge_intervals(intervals):
"""
A simple algorithm can be used:
1. Sort the intervals in increasing order
2. Push the first interval on the stack
3. Iterate through intervals and for each one compare current interval
with the top of the stack and:
A. If current interval does not overlap, push on to stack
B. If current interval does overlap, merge both intervals in to one
and push on to stack
4. At the end return stack
"""
si = sorted(intervals, key=lambda tup: tup[0])
merged = []
for tup in si:
if not merged:
merged.append(tup)
else:
b = merged.pop()
if b[1] >= tup[0]:
new_tup = tuple([b[0], tup[1]])
merged.append(new_tup)
else:
merged.append(b)
merged.append(tup)
return merged
if __name__ == '__main__':
l = [(5, 7), (11, 116), (3, 4), (10, 12), (6, 12)]
print("Original list of ranges: {}".format(l))
merged_list = merge_intervals(l)
print("List of ranges after merge_ranges: {}".format(merged_list))
My questions are:
- Run time: Without counting sorting I believe this runs in \$O(n)\$. With counting sorting that would go up to \$O(n*\log(n))\$. Is this correct?
- Stylistically does anything need to be improved?
- Is there a more efficient way of solving this problem?
Solution
This code has a bug. Consider
Your code is nice, and implements the simplest thing that works. However, you might want to work on improving your variable names. Neither
Notice that I decided not to use an explicit
I also put the comments inside the code, rather than into the docstring. Docstrings are good, but they are intended to explain how to use the function (e.g. what the expected type of parameters is, what the function returns, …). To document implementation details, use normal comments.
You have correctly identified the (average and worst-case) algorithmic complexity as \$O(n \log n)\$. You must consider the sorting. However, best-case complexity is \$O(n)\$, as the Tim Sort used by Python detects already-sorted input.
While your code provides a simple and robust solution, it would be certainly possible to reduce average algorithmic complexity. I expect a kind of self-merging interval tree to have average complexity closer to \$O(n \log m)\$ where \$m\$ is the number of intervals after merging.
merge_intervals([(0, 3), (1, 2)]). This returns [(0, 2)] although it should be [(0, 3)]. This can be solved throughnew_tup = (b[0], max(b[1], tup[1]))Your code is nice, and implements the simplest thing that works. However, you might want to work on improving your variable names. Neither
l, b, or si are very descriptive, and tup and new_tup unnecessarily refer to their type name rather than their meaning. In contrast, merged is a very clear name. A rewrite of the function body might be:sorted_by_lower_bound = sorted(intervals, key=lambda tup: tup[0])
merged = []
for higher in sorted_by_lower_bound:
if not merged:
merged.append(higher)
else:
lower = merged[-1]
# test for intersection between lower and higher:
# we know via sorting that lower[0] <= higher[0]
if higher[0] <= lower[1]:
upper_bound = max(lower[1], higher[1])
merged[-1] = (lower[0], upper_bound) # replace by merged interval
else:
merged.append(higher)
return mergedNotice that I decided not to use an explicit
tuple(…) constructor – that's mostly useful for converting to tuples from iterables. For tuple literals, you can use the (…, …) syntax.I also put the comments inside the code, rather than into the docstring. Docstrings are good, but they are intended to explain how to use the function (e.g. what the expected type of parameters is, what the function returns, …). To document implementation details, use normal comments.
You have correctly identified the (average and worst-case) algorithmic complexity as \$O(n \log n)\$. You must consider the sorting. However, best-case complexity is \$O(n)\$, as the Tim Sort used by Python detects already-sorted input.
While your code provides a simple and robust solution, it would be certainly possible to reduce average algorithmic complexity. I expect a kind of self-merging interval tree to have average complexity closer to \$O(n \log m)\$ where \$m\$ is the number of intervals after merging.
Code Snippets
new_tup = (b[0], max(b[1], tup[1]))sorted_by_lower_bound = sorted(intervals, key=lambda tup: tup[0])
merged = []
for higher in sorted_by_lower_bound:
if not merged:
merged.append(higher)
else:
lower = merged[-1]
# test for intersection between lower and higher:
# we know via sorting that lower[0] <= higher[0]
if higher[0] <= lower[1]:
upper_bound = max(lower[1], higher[1])
merged[-1] = (lower[0], upper_bound) # replace by merged interval
else:
merged.append(higher)
return mergedContext
StackExchange Code Review Q#69242, answer score: 23
Revisions (0)
No revisions yet.