patternMinor
Closed form solution for optimization problem
Viewed 0 times
problemclosedoptimizationforformsolution
Problem
Consider the problem of finding the real-valued matrix $C$ so that
$$\|S-AC\|_F^2\qquad(1)$$
is minimal. ($S$ and $A$ are real valued matrices and $_F$ denotes the Frobenius norm). This problem has a closed form solution of $C=A^+S$, where $A^+$ is the pseudo-inverse of $A$.
I need to control the magnitude of the elements in $C$, that is an extra regularization term $\varepsilon\|C\|_F^2$ must be added to (1). Is it there a closed form solution for minimizing
$$\|S-AC\|_F^2 + \varepsilon\|C\|_F^2\,?\qquad(2)$$
$$\|S-AC\|_F^2\qquad(1)$$
is minimal. ($S$ and $A$ are real valued matrices and $_F$ denotes the Frobenius norm). This problem has a closed form solution of $C=A^+S$, where $A^+$ is the pseudo-inverse of $A$.
I need to control the magnitude of the elements in $C$, that is an extra regularization term $\varepsilon\|C\|_F^2$ must be added to (1). Is it there a closed form solution for minimizing
$$\|S-AC\|_F^2 + \varepsilon\|C\|_F^2\,?\qquad(2)$$
Solution
There is a closed-form solution to your problem, but first it helps to know two facts to derive the minimizer. The first fact is that we can re-write the Frobenius norm squared as a trace operation. Namely,
$$\|C\|^2_F = \text{Tr}\left[CC'\right],$$
where $C'$ denotes the transpose of $C$. The second fact is the derivative of the Frobenius norm squared, which is given by $$\frac{\partial}{\partial C} \text{Tr}\left[(S - AC)(S - AC)'\right] = -2A' (S - AC).$$
Luckily for us the derivative of sums is equal to the sum of the derivatives (i.e., the derivative is a linear function). So we can take the derivative of the objective with respect to $C$ piecemeal. We end up with $$\frac{\partial}{\partial C} \|S - AC\|^2_F + \lambda\|C\|^2_F = -2A' (S - AC) + 2\lambda C.$$ Setting this equal to 0 and solving we have, $$\begin{align*} -2A' (S - AC) + 2\lambda C &= 0 \Leftrightarrow \\ -2A'S + 2A'AC + 2\lambda C &=0 \Leftrightarrow \\ -2A'S + 2(A'A + \lambda I)C & = 0 \Leftrightarrow \\ C &= (A'A + \lambda I)^{-1}A'S,
\end{align*}$$ where $I$ is the identity matrix.
An eigendecomposition of this result gives more insight into the behavior. Put simply, you add $\lambda$ to the eigenvalues of the $A'A$ matrix which should shrink your final estimates. This is a form of regularization known as ridge regression. Except here we are applying it for multivariate multi regression instead of linear regression.
$$\|C\|^2_F = \text{Tr}\left[CC'\right],$$
where $C'$ denotes the transpose of $C$. The second fact is the derivative of the Frobenius norm squared, which is given by $$\frac{\partial}{\partial C} \text{Tr}\left[(S - AC)(S - AC)'\right] = -2A' (S - AC).$$
Luckily for us the derivative of sums is equal to the sum of the derivatives (i.e., the derivative is a linear function). So we can take the derivative of the objective with respect to $C$ piecemeal. We end up with $$\frac{\partial}{\partial C} \|S - AC\|^2_F + \lambda\|C\|^2_F = -2A' (S - AC) + 2\lambda C.$$ Setting this equal to 0 and solving we have, $$\begin{align*} -2A' (S - AC) + 2\lambda C &= 0 \Leftrightarrow \\ -2A'S + 2A'AC + 2\lambda C &=0 \Leftrightarrow \\ -2A'S + 2(A'A + \lambda I)C & = 0 \Leftrightarrow \\ C &= (A'A + \lambda I)^{-1}A'S,
\end{align*}$$ where $I$ is the identity matrix.
An eigendecomposition of this result gives more insight into the behavior. Put simply, you add $\lambda$ to the eigenvalues of the $A'A$ matrix which should shrink your final estimates. This is a form of regularization known as ridge regression. Except here we are applying it for multivariate multi regression instead of linear regression.
Context
StackExchange Computer Science Q#60938, answer score: 5
Revisions (0)
No revisions yet.