patternMinor
Under what conditions is K-means clustering transformation-invariant?
Viewed 0 times
whatinvariantmeansunderconditionsclusteringtransformation
Problem
Given a set of data points $X = \{x_1, x_2, \ldots, x_m\}$ where $x_i \in \mathbb{R}^d$ we run K-means on $X$ and obtain the clusters $c_1, c_2, \ldots, c_k$.
Now, if we create a new dataset $Y = \{y_1, y_2, \ldots, y_m\}$ where $y_i = Ax_i + b$ and $y_i \in \mathbb{R}^d$ and run K-means on $Y$ to get clusters $g_1, g_2, \ldots g_k$.
Under what conditions of $A$ and $b$ are we guaranteed to get the same clusters?
Let's assume that K-means is using the euclidean distance and has same initial conditions on both algorithms, that is, if the initial centers for X are $c^0_1, \ldots, c^0_k$ then the initial centers for Y are $g^0_1, \ldots, g^0_k$ where $g^0_i = Ac^0_i + b$.
So far I've thought that $A$ has to be full rank and $b$ can be any vector. However, I haven't been able to prove it.
Now, if we create a new dataset $Y = \{y_1, y_2, \ldots, y_m\}$ where $y_i = Ax_i + b$ and $y_i \in \mathbb{R}^d$ and run K-means on $Y$ to get clusters $g_1, g_2, \ldots g_k$.
Under what conditions of $A$ and $b$ are we guaranteed to get the same clusters?
Let's assume that K-means is using the euclidean distance and has same initial conditions on both algorithms, that is, if the initial centers for X are $c^0_1, \ldots, c^0_k$ then the initial centers for Y are $g^0_1, \ldots, g^0_k$ where $g^0_i = Ac^0_i + b$.
So far I've thought that $A$ has to be full rank and $b$ can be any vector. However, I haven't been able to prove it.
Solution
The answer depends on your K-means algorithm, but what follows should work for standard algorithms.
You will get the same result if your transformation $T$ satisfies two conditions:
You can check this by going over the algorithm, showing that it always makes the same choices.
You will get the same result if your transformation $T$ satisfies two conditions:
- It preserves distances: $d(z,w) = d(T(z),T(w))$, where $d$ is your metric, say $d(z,w) = \|z-w\|$.
- It preserves averages: if $\sum_i p_i z_i$ is a convex combination that $T(\sum_i p_i z_i) = \sum_i p_i T(z_i)$.
You can check this by going over the algorithm, showing that it always makes the same choices.
Context
StackExchange Computer Science Q#64843, answer score: 6
Revisions (0)
No revisions yet.