HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Find the smallest summed distances by uniquely pairing elements of one set to elements of another set

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
pairingsmallesttheelementssummedoneuniquelyanotherfinddistances

Problem

As input I have two sets of points in RN, typically for large N, for example N=40. Supose both sets have m elements:

S = s1 ... sm

T = t1 ... tm

Semantically both sets are equal, but due to noise (of whatever kind) on the R^N points, elements which should be semantically the same still have a distance larger than 0.

What I want to find is m tuples (si, tj) such that the sum of distance(si, tj) is minimized and such that sk and tk exactly occur once in the set of tuples for k=1...m. Basically (i,j) have to be chosen as towers on a chessboard which can not hit each other while minimizing the summed distances.

In other words, I want to find an onto one-one map between S and T which 'is sort of an identity map, but robust to noise'. We assume that the distance measure is a good indication of how similar elements are.

Basically I need to find a permutation of 1...N, and hence I think this problem is either NP-hard or NP-complete, since it 'feels' quite similar to the TSP; however I have not been able to rewrite the TSP problem to a subset of my problem here.

Is this problem realistically solvable for large N? Is there a name for this problem? What would be a feasible solution? Is there another criteria which might be better than the summed distances?

I thought of a greedy approach, let D be a matrix of the distances, dij = distance(si, tj).

T = {}
while D is not empty:
    (i,j) = argmin-(i,j) dij
    append (i,j) to T
    set row i and column j to infinity.


This does not result in the optimal solution, but finds a solution. Would this be my best bet? Should I use simulated annealing, or is it overkill?

PS: from my point of view, this is just a minor problem in a bigger ML problem, however, I am very much interested in the CS background of it.

Solution

This is the problem of finding the maximum matching in a weighted bipartite graph. There are efficient algorithms that solve this problem in polynomial time.

Basically, you have a complete bipartite graph with $2m$ vertices. $S$ forms one set of vertices; $T$ forms the other set of vertices. For each $i,j$, create an edge from $s_i$ to $t_j$ whose weight is equal to $-d(s_i,t_j)$, the negation of the distance between $s_i$ and $t_j$.

Now, you want to find a matching whose weight is as large as possible. This corresponds to a set of pairs $(s_i,t_j)$ whose total distance is as small as possible. From there, you can use existing algorithms on this graph to find the maximum-weight matching.

Context

StackExchange Computer Science Q#28141, answer score: 6

Revisions (0)

No revisions yet.