patternMinor
Is there a metric for the similarity of two image filters?
Viewed 0 times
theimagemetrictwoforfilterssimilaritythere
Problem
Definitions
An image filter is a matrix $m \in \mathbb{R}^{k_1 \times k_2 \times k_3}$ which gets applied to an image $I \in \mathbb{R}^{l_1 \times l_2 \times l_3}$ as a discrete convolution
$$I'(n_1, n_2, n_3) = \sum_{i=0}^{k_1} \sum_{j=0}^{k_2} \sum_{k=0}^{k_3} I[n_1-i - \lfloor \frac{k_1}{2} \rfloor, n_2 - j - \lfloor \frac{k_2}{2} \rfloor, n_3 - k - \lfloor \frac{k_3}{2} \rfloor] \cdot m[i, j, k]$$
There are some well-known filters like Laplace filters, Prewitt filters, ... (see my interactive example)
For example, for an RGB image $k_3 = 3$ and $k_1, k_2$ are width and height.
Question
Is there a metric to compare the similarity of image filters?
Context
Convolutional Neural Networks (CNNs) learn image filters. As they are randomly initialized, the filters they learn are different each time you train them. I am interested in quantifying those differences.
I could, of course, use any metric for elements of $\mathbb{R}^{k_1 \times k_2 \times k_3}$. However, consider the filters
$$
\begin{align}
m_1 &= \begin{pmatrix}-1&0&1\\-1&0&1\\-1&0&1\end{pmatrix}\\
m_2 &= \begin{pmatrix}1&0&-1\\1&0&-1\\1&0&-1\end{pmatrix}\\
m_3 &= \begin{pmatrix}-0.9&0.1&1\\-0.9&0.1&1\\-0.9&0.1&1\end{pmatrix}\\
\end{align}
$$
For the image
$m_1$ produces
and $m_2$ produces
You can see a difference, but much less than for the result of $m_3$:
This is probably not captured by most metrics. Another idea was to apply the metrics to the processed images on a given dataset, but this would make the results depend on the dataset and be computationally very intensive.
(In case you want to try image filters yourself with Python: https://gist.github.com/MartinThoma/f51a1044c4abc6c7b81915ef96b7cfbd)
An image filter is a matrix $m \in \mathbb{R}^{k_1 \times k_2 \times k_3}$ which gets applied to an image $I \in \mathbb{R}^{l_1 \times l_2 \times l_3}$ as a discrete convolution
$$I'(n_1, n_2, n_3) = \sum_{i=0}^{k_1} \sum_{j=0}^{k_2} \sum_{k=0}^{k_3} I[n_1-i - \lfloor \frac{k_1}{2} \rfloor, n_2 - j - \lfloor \frac{k_2}{2} \rfloor, n_3 - k - \lfloor \frac{k_3}{2} \rfloor] \cdot m[i, j, k]$$
There are some well-known filters like Laplace filters, Prewitt filters, ... (see my interactive example)
For example, for an RGB image $k_3 = 3$ and $k_1, k_2$ are width and height.
Question
Is there a metric to compare the similarity of image filters?
Context
Convolutional Neural Networks (CNNs) learn image filters. As they are randomly initialized, the filters they learn are different each time you train them. I am interested in quantifying those differences.
I could, of course, use any metric for elements of $\mathbb{R}^{k_1 \times k_2 \times k_3}$. However, consider the filters
$$
\begin{align}
m_1 &= \begin{pmatrix}-1&0&1\\-1&0&1\\-1&0&1\end{pmatrix}\\
m_2 &= \begin{pmatrix}1&0&-1\\1&0&-1\\1&0&-1\end{pmatrix}\\
m_3 &= \begin{pmatrix}-0.9&0.1&1\\-0.9&0.1&1\\-0.9&0.1&1\end{pmatrix}\\
\end{align}
$$
For the image
$m_1$ produces
and $m_2$ produces
You can see a difference, but much less than for the result of $m_3$:
This is probably not captured by most metrics. Another idea was to apply the metrics to the processed images on a given dataset, but this would make the results depend on the dataset and be computationally very intensive.
(In case you want to try image filters yourself with Python: https://gist.github.com/MartinThoma/f51a1044c4abc6c7b81915ef96b7cfbd)
Solution
The ‘k-translation correlation’ is probably a good candidate for what you are looking for. It measures the maximum correlation between a pair of two filters $\mathbf{W_i}$ and $\mathbf{W_j}$ achieved by translating one filter up to k steps along any spatial dimension and then selecting the maximum thereof:
$$\rho_k(\mathbf{W_i,W_j})=\max_{(x,y)\in \{-k,...,k\}^2\setminus(0,0)} \frac{\langle\mathbf{W_i}, T(\mathbf{W_j}, x,y)\rangle_f}{\left \| \mathbf{W_i}\right \|_2 \left \| \mathbf{W_j}\right \|_2}\,,$$
where $T(\cdot, x,y)$ refers to the translation of its first operand by $(x,y)$ and $\langle\cdot,\cdot\rangle_f$ denotes the flattened inner product of the two filters (the second of which is translated). Note that both filters are reshaped to column vectors to perform the inner product. For more details refer to Doubly Convolutional Networks (Zhai, Cheng, Lu, Zhang, in Proceedings of 30th Conference on Neural Information Processing Systems (NIPS 2016)).
$$\rho_k(\mathbf{W_i,W_j})=\max_{(x,y)\in \{-k,...,k\}^2\setminus(0,0)} \frac{\langle\mathbf{W_i}, T(\mathbf{W_j}, x,y)\rangle_f}{\left \| \mathbf{W_i}\right \|_2 \left \| \mathbf{W_j}\right \|_2}\,,$$
where $T(\cdot, x,y)$ refers to the translation of its first operand by $(x,y)$ and $\langle\cdot,\cdot\rangle_f$ denotes the flattened inner product of the two filters (the second of which is translated). Note that both filters are reshaped to column vectors to perform the inner product. For more details refer to Doubly Convolutional Networks (Zhai, Cheng, Lu, Zhang, in Proceedings of 30th Conference on Neural Information Processing Systems (NIPS 2016)).
Context
StackExchange Computer Science Q#65828, answer score: 3
Revisions (0)
No revisions yet.