patternMinor
Efficiently partition tree into clusters of similar diameter
Viewed 0 times
efficientlypartitionintodiameterclusterssimilartree
Problem
I am looking for a way to split a tree into $k$ clusters so that the cluster with largest diameter is as small as possible. All edges have the same length. I'm hoping for an algorithm that can handle arbitrary trees (with arbitrary branching factor), not just binary trees.
Solution
Your problem is equivalent to the following:
Input: a tree $T$, an integer $r$ (the radius)
Find: a set $S$ of $k$ vertices, such that every vertex in the tree is at distance $\le r$ from one of the vertices in $S$; or failure if no such set exists
Why is this equivalent? Well, if you have such a set $S$, then you immediately can find $k$ clusters of diameter $\le 2r$ (for each $s \in S$, you get a cluster of nodes that are at distance $\le r$ from $s$). Conversely, if you have $k$ clusters of diameter $\le 2r$, then you can find such a set $S$: given a cluster, pick a pair of nodes $x,y$ in the cluster that maximizes $d(x,y)$; then find a shortest path from $x$ to $y$, and let $s$ be a node halfway along that shortest path; then one can prove every node in the cluster is at distance $\le r$ from $s$. So, if you can solve the above problem, then linear search or binary search on $r$ will find the smallest such diameter.
The best algorithm I can find for this problem is the trivial $O(n^{k+1})$ algorithm: enumerate all subsets of size $k$, and test each to see whether it satisfies all the requirements. The number of sets you must enumerate is $C(n,k) = O(n^k)$, and given a candidate set you can test whether it is valid in $O(n)$ time. Unfortunately, for modest or large values of $k$, this will be totally impractical.
As an optimization, you can let $S'$ be the set of vertices that are at height $\ge r$. (The height of a vertex $v$ is the length of the longest downward path from $v$ to a leaf; in other words, the height of a leaf is $0$, and the height of an internal node is one more than the height of its highest child.) Then, you only need to enumerate subsets of $S'$ of size $k$. In practice, this might provide a significant improvement, even though asymptotically it might not make much of a difference to the worst-case running time.
I don't know whether it is possible to do better, or whether a polynomial-time algorithm exists. I think the problem can be solved in polynomial time for binary trees using dynamic programming, but you want to handle arbitrary trees, and I don't know what happens in that case.
Input: a tree $T$, an integer $r$ (the radius)
Find: a set $S$ of $k$ vertices, such that every vertex in the tree is at distance $\le r$ from one of the vertices in $S$; or failure if no such set exists
Why is this equivalent? Well, if you have such a set $S$, then you immediately can find $k$ clusters of diameter $\le 2r$ (for each $s \in S$, you get a cluster of nodes that are at distance $\le r$ from $s$). Conversely, if you have $k$ clusters of diameter $\le 2r$, then you can find such a set $S$: given a cluster, pick a pair of nodes $x,y$ in the cluster that maximizes $d(x,y)$; then find a shortest path from $x$ to $y$, and let $s$ be a node halfway along that shortest path; then one can prove every node in the cluster is at distance $\le r$ from $s$. So, if you can solve the above problem, then linear search or binary search on $r$ will find the smallest such diameter.
The best algorithm I can find for this problem is the trivial $O(n^{k+1})$ algorithm: enumerate all subsets of size $k$, and test each to see whether it satisfies all the requirements. The number of sets you must enumerate is $C(n,k) = O(n^k)$, and given a candidate set you can test whether it is valid in $O(n)$ time. Unfortunately, for modest or large values of $k$, this will be totally impractical.
As an optimization, you can let $S'$ be the set of vertices that are at height $\ge r$. (The height of a vertex $v$ is the length of the longest downward path from $v$ to a leaf; in other words, the height of a leaf is $0$, and the height of an internal node is one more than the height of its highest child.) Then, you only need to enumerate subsets of $S'$ of size $k$. In practice, this might provide a significant improvement, even though asymptotically it might not make much of a difference to the worst-case running time.
I don't know whether it is possible to do better, or whether a polynomial-time algorithm exists. I think the problem can be solved in polynomial time for binary trees using dynamic programming, but you want to handle arbitrary trees, and I don't know what happens in that case.
Context
StackExchange Computer Science Q#45327, answer score: 3
Revisions (0)
No revisions yet.