gotchaMinor
Minimum Height Trees. Why does deleting leaves layer-by-layer give us the answer?
Viewed 0 times
whydeletinglayertheanswerheightminimumleavesgivetrees
Problem
Here is the problem I tried to solve. Finally, I gave up and found a solution which was correct. Here is it.
Here is the problem description:
For a undirected graph with tree characteristics, we can choose any
node as the root. The result graph is then a rooted tree. Among all
possible rooted trees, those with minimum height are called minimum
height trees (MHTs). Given such a graph, write a function to find all
the MHTs and return a list of their root labels.
The graph contains n nodes which are labeled from 0 to n - 1. You will
be given the number n and a list of undirected edges (each edge is a
pair of labels).
You can assume that no duplicate edges will appear in edges. Since all
edges are undirected, [0, 1] is the same as [1, 0] and thus will not
appear together in edges.
Example 1 :
Input: n = 4, edges = [[1, 0], [1, 2], [1, 3]]
Output: 1
Example 2 :
Input: n = 6, edges = [[0, 3], [1, 3], [2, 3], [4, 3], [5, 4]]
Output: 3, 4
Here is the solution I tried to understand:
The basic idea is "keep deleting leaves layer-by-layer, until reach
the root."
Specifically, first find all the leaves, then remove them. After
removing, some nodes will become new leaves. So we can continue remove
them. Eventually, there is only 1 or 2 nodes left. If there is only
one node left, it is the root. If there are 2 nodes, either of them
could be a possible root.
I'm not sure I understand it so I could explain it to somebody. Why does deleting leaves layer-by-layer give us the answer?
Here is the problem description:
For a undirected graph with tree characteristics, we can choose any
node as the root. The result graph is then a rooted tree. Among all
possible rooted trees, those with minimum height are called minimum
height trees (MHTs). Given such a graph, write a function to find all
the MHTs and return a list of their root labels.
The graph contains n nodes which are labeled from 0 to n - 1. You will
be given the number n and a list of undirected edges (each edge is a
pair of labels).
You can assume that no duplicate edges will appear in edges. Since all
edges are undirected, [0, 1] is the same as [1, 0] and thus will not
appear together in edges.
Example 1 :
Input: n = 4, edges = [[1, 0], [1, 2], [1, 3]]
0
|
1
/ \
2 3Output: 1
Example 2 :
Input: n = 6, edges = [[0, 3], [1, 3], [2, 3], [4, 3], [5, 4]]
0 1 2
\ | /
3
|
4
|
5Output: 3, 4
Here is the solution I tried to understand:
The basic idea is "keep deleting leaves layer-by-layer, until reach
the root."
Specifically, first find all the leaves, then remove them. After
removing, some nodes will become new leaves. So we can continue remove
them. Eventually, there is only 1 or 2 nodes left. If there is only
one node left, it is the root. If there are 2 nodes, either of them
could be a possible root.
I'm not sure I understand it so I could explain it to somebody. Why does deleting leaves layer-by-layer give us the answer?
Solution
Let $T$ be a tree, and let $T'$ be the tree formed from $T$ by removing all leaves (vertices of degree 1). Assume that $T$ contains at least three vertices, so that $T'$ is not empty. This implies that every leaf $f \in T$ has a (unique) neighbor $p(f) \in T$ which is not a leaf.
For $v \in T$, denote by $h_v(T)$ the height of $T$ when rooted at $v$.
I claim that the following hold:
For every vertex $v \in T'$, $$ h_v(T) = h_v(T') + 1. $$
For every leaf $f \in T \setminus T'$, $$ h_f(T) = h_{p(f)}(T) + 1. $$
For the first claim, consider $T$ rooted at a non-leaf $v$. The height of $T$ is the length of the maximal root-to-leaf path. Every such path gets shortened by one in $T'$, implying that $h_v(T) = h_v(T') + 1$.
For the second claim, let $x$ be a leaf farthest away from $p(f)$. We can assuming that $x \neq f$ since $p(f)$ has at least two neighbors, and $f$ is at minimal distance from $p(f)$. Clearly $d(f,x) = d(p(f),x) + 1$, and so $h_f(T) \geq h_{p(f)}(T) + 1$. The other direction follows from the triangle inequality, which implies that $d(f,y) \leq d(p(f),y) + 1$ for all nodes $y$.
The algorithm. If $T$ contains one or two vertices, then clearly all vertices are centers (roots of minimum height trees). Otherwise, by the second claim, no leaf can be a center, and so by the first claim, the centers of $T$ are the same as the centers of $T'$. This suggests the following algorithm for computing the centers of $T$:
The correctness of the algorithm follows the reasoning stated above.
For $v \in T$, denote by $h_v(T)$ the height of $T$ when rooted at $v$.
I claim that the following hold:
For every vertex $v \in T'$, $$ h_v(T) = h_v(T') + 1. $$
For every leaf $f \in T \setminus T'$, $$ h_f(T) = h_{p(f)}(T) + 1. $$
For the first claim, consider $T$ rooted at a non-leaf $v$. The height of $T$ is the length of the maximal root-to-leaf path. Every such path gets shortened by one in $T'$, implying that $h_v(T) = h_v(T') + 1$.
For the second claim, let $x$ be a leaf farthest away from $p(f)$. We can assuming that $x \neq f$ since $p(f)$ has at least two neighbors, and $f$ is at minimal distance from $p(f)$. Clearly $d(f,x) = d(p(f),x) + 1$, and so $h_f(T) \geq h_{p(f)}(T) + 1$. The other direction follows from the triangle inequality, which implies that $d(f,y) \leq d(p(f),y) + 1$ for all nodes $y$.
The algorithm. If $T$ contains one or two vertices, then clearly all vertices are centers (roots of minimum height trees). Otherwise, by the second claim, no leaf can be a center, and so by the first claim, the centers of $T$ are the same as the centers of $T'$. This suggests the following algorithm for computing the centers of $T$:
- While $T$ contains at least three vertices, remove all leaves of $T$.
- Once $T$ contains one or two vertices, return all vertices of $T$.
The correctness of the algorithm follows the reasoning stated above.
Context
StackExchange Computer Science Q#93466, answer score: 6
Revisions (0)
No revisions yet.