gotchaMinor

Measuring difference between two sets of neural network weights?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

neuralweightsdifferencetwobetweenmeasuringsetsnetwork

Problem

Suppose that we take a neural network of a given topology, and run it through two training processes, obtaining two different sets of converged weights at the end of the training.

What is a good way to measure the difference between the two sets of weights?

If this were a curve-fit where the parameters were largely orthogonal, we could simply treat the parameters as a cartesian space and compute the length of the vector difference between the sets.

However, the weights in a neural network are often not at all orthogonal. There are trivial redundancies, in that we can reorder nodes within a layer, or multiply the input weights to a given node by a constant factor and the output weights by its inverse. Even if we remove these by putting the weights in some canonical form (normalizing and sorting them, for instance), I can't see any reason to assume that we won't have plenty of remaining non-orthogonality. As a result, two sets of neural-network weights that have a large vector-difference length may be essentially equivalent in practice.

An alternate conceptual approach would be to numerically integrate the difference between the output values over the entire input space, but for typical neural networks this space has high enough dimension for such an integration to be intractable. So we'd need to do some sort of sampling-based approach, which introduces more handwaving about whether it's a meaningful measure.

Is there any existing consensus on what's a good measure? Or, for that matter, any existing practice for what other researchers have used?

Solution

One way to compare two neural networks is to compare how similar their predictions are, on typical instances.

Ideally, we'd like to compute the expected value of this similarity, taken over the distribution on instances. However, as you say, the input space is high-dimensional, so the integral is hard to compute. Also, the distribution on instances typically isn't known in any analytical or useful form.

Therefore, the standard trick is to estimate this expected value, using sampling. We draw a sample of instances, compute the similarity of the predictions from the two networks on each instance, and take the average of those similarity values. If the sample is large enough, this should be a good approximation to the expected value (to the integral). So, you'll set aside a separate validation set, and then compute the average similarity between the two networks' predictions on this validation set.

I can formalize this more precisely. Let $f,g$ be the two classifiers. Suppose there are $k$ classes. Assume you are using a softmax layer at the output of the neural network. Then the output of a classifier $f$ on instance $x$ is a $k$-vector $f(x)$ whose $i$th component can be interpreted as the probability that $x$ has class $c$ (as predicted by the classifier). So, we want some way to measure the similarity of the two $k$-vectors $f(x),g(x)$, i.e., the similarity of those two probability distributions on the $k$ classes. You can use any distance metric or similarity measure on probability distributions.

Let $d(\cdot,\cdot)$ represent a distance measure on two distributions (or, more generally, a dissimilarity measure -- it need not be a metric). Ideally, we'd like to compute $\mathbb{E}[d(f(X),g(X))]$ where the expectation is taken with respect to a random variable $X$ that is distributed according to the input distribution on instances. To approximate this, we compute

$$D = {1 \over n} \sum_{j=1}^n d(f(x_j),g(x_j)),$$

where $x_1,\dots,x_n$ is a validation set that is held out and set aside for this purpose. The validation set should be separate from the training set.

You can now instantiate $d$ with any distance measure on distributions of your choice. Some reasonable choices include: KL divergence, or the Jensen-Shannon divergence (a symmetric version of the KL divergence); total variation distance; earth mover's distance; or a very simple measure where $d(u,v) = 1$ if $\arg\max_i u_i \ne \arg\max_i v_i$, else $d(u,v)=0$ (i.e., compare whether both networks predicts the same class as the most likely class).

This approach is suitable if you want to measure how similar the predictions of the two neural networks are, without regard to their internal weights -- i.e., treating them as a black box. I don't know how to compare how similar their internal structure is; there are probably many non-trivial transformations to networks that compute essentially the same thing, but with very different-looking weights.

Context

StackExchange Computer Science Q#74488, answer score: 4

Revisions (0)

No revisions yet.