HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Given a set of intervals on the real line, find a minimum set of points that "cover" all the intervals

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
realthelineallpointsminimumintervalsthatfindcover

Problem

I've been trying to find an efficient way to solve the problem of finding a minimum (not minimal) set of time points that cover a given family of intervals on the real line, that is, for each interval $I$ there should be some time point $t$ from the chosen set such that $t \in I$. However, I am not sure if it's that straightforward. I tried modeling it as a minimum edge cover problem ("P" polynomial time complexity: intervals are vertices and intersection between intervals is an edge), but that doesn't work because for 1 time point in optimum solution there can be multiple edges.

I developed a greedy solution: have intervals sorted in increasing order of their end times. Then iterate over the intervals in the order. If a time point hasn't been inserted into the solution that covers the current interval, then insert it and remove all intervals that are covered by it from consideration. Continue till no intervals remaining.
Example:
(0, 10), (3, 12), (11, 15)

Add time 10 to the solution.
Remove (0, 10) and (3, 12) from consideration.
Add time 15 to solution.
Remove (11, 15) from consideration.
Final solution is of size 2. (time 10 and 15).

I can't prove it's optimum because it's not modeled as edge cover or vertex cover or known graph problem. I tried modeling it as "minimum clique cover problem" but not sure if it works.

  • How to model it properly to understand its complexity (P vs NP complexity)?



  • How to know if the above solution is optimum?

Solution

The problem is in $\mathbf{P}$.

In fact, there is even a (very) efficient algorithm for it:

Let $[a_1, b_1], \dots, [a_n, b_n]$ be the input intervals and assume $a_i \le b_i$ for all $i$. Additionally, let us assume (for now) that the intervals are given in sorted order as a list, sorted firstly respective to the $a_i$ and secondly to the $b_i$.

An optimal solution can be found as follows: Set $r$ to $a_1$. Then, starting with $i = 1$, check if $a_{i+1} \le b_i$; if that is the case, set $r$ to $a_{i+1}$ and increment $i$; otherwise, add $r$ to the solution and eliminate all intervals from $[a_1, b_1]$ up to $[a_i, b_i]$. Repeat until no more intervals are left.

Why is this an optimal solution? $[a_1, b_1]$ must be covered, so there must be some $r \in [a_1, b_1]$ in the solution. Then, either $r$ covers all $n$ intervals or there is $i$ such that $r b_1$ and there is no $r$ covering both $[a_1, b_1]$ and $[a_2, b_2]$. Apply induction.

The time complexity is still $O(n \log n)$, though, since you require sorting the end times first.

Context

StackExchange Computer Science Q#101909, answer score: 2

Revisions (0)

No revisions yet.