snippetMinor
How to find the most unique vectors in a set?
Viewed 0 times
uniquethevectorshowsetfindmost
Problem
A question bridging math and computer science, I have on the order of 10000 vectors each of a equal but high dimension, say 6 or 7 dimensions. I want to find a given number of 'unique' vectors in the set, the number generally equal to the dimension of the vectors. In math terms I believe I want to find a subset of vectors that has the biggest angle from its neighbours in that sub set. Vector length is also a factor, i.e. a unique but nearly zero length vector is less preferred than one less unique in angle but has a finite length.
An example:
1: [1 2 4 1 5]
2: [1 1 2 1 3]
3: [1 2 4 1 4]
...
9999: [2 1 1 2 2]
10000: [4 5 4 2 1]
vectors 1 and 3 are quite similar, 2 less so, 10000 is most different. I think the dot product and higher order 'angle' is the best measure of uniqueness, though there may be better.
Visually, collapsing the dimension down to 2, the vectors below in black represent the full set, the vectors in red are the subset of seven most unique.
Concept update: 03/05/2018
I have recently stumbled upon the concept of the Latin hypercube, and the higher dimensional mathematics behind it that attempt to find an even spread of higher-dimensional vectors. I don't know exactly how this could be applied, but it has been the closest concept so far. Can anyone see how it might be applicable to the problem?
An example:
1: [1 2 4 1 5]
2: [1 1 2 1 3]
3: [1 2 4 1 4]
...
9999: [2 1 1 2 2]
10000: [4 5 4 2 1]
vectors 1 and 3 are quite similar, 2 less so, 10000 is most different. I think the dot product and higher order 'angle' is the best measure of uniqueness, though there may be better.
Visually, collapsing the dimension down to 2, the vectors below in black represent the full set, the vectors in red are the subset of seven most unique.
Concept update: 03/05/2018
I have recently stumbled upon the concept of the Latin hypercube, and the higher dimensional mathematics behind it that attempt to find an even spread of higher-dimensional vectors. I don't know exactly how this could be applied, but it has been the closest concept so far. Can anyone see how it might be applicable to the problem?
Solution
I suspect your problem might be NP-hard (by reduction from Clique), so it might be hard to find an efficient algorithm for it.
One heuristic is to use the farthest-point first method, where at each stage you pick the point whose angle to the others already picked is as large as possible. In other words, suppose we have already picked a set $S$ of vectors. Now we find the vector $x$ that maximizes $\min \{\text{angle}(x,s) : s \in S\}$, add it to $S$, and repeat. This most likely won't give the optimal set of vectors, but it might give something that is somewhat close to optimal.
Another possible approach is to use binary search and some kind of clique-finding algorithm. Let $t$ be a threshold. Form an undirected graph $G$ where each vector is a vertex in the graph; and you have an edge $(u,v)$ if the angle between the two vectors $u,v$ is at least $t$. Then if you can find a clique of size $k$ (where $k$ is the desired size of your subset), then you've found a subset of vectors that are all at angle $\ge t$ from each other. Now repeat this, using binary search on $t$ to find the largest $t$ for which you can find a $k$-clique. In this way you might be able to take advantage of methods in the literature for finding $k$-cliques. (For instance, one simple preprocessing step you can do to reduce the size of the graph is: delete any vertex whose degree is less than $k$, and repeat until convergence.)
One heuristic is to use the farthest-point first method, where at each stage you pick the point whose angle to the others already picked is as large as possible. In other words, suppose we have already picked a set $S$ of vectors. Now we find the vector $x$ that maximizes $\min \{\text{angle}(x,s) : s \in S\}$, add it to $S$, and repeat. This most likely won't give the optimal set of vectors, but it might give something that is somewhat close to optimal.
Another possible approach is to use binary search and some kind of clique-finding algorithm. Let $t$ be a threshold. Form an undirected graph $G$ where each vector is a vertex in the graph; and you have an edge $(u,v)$ if the angle between the two vectors $u,v$ is at least $t$. Then if you can find a clique of size $k$ (where $k$ is the desired size of your subset), then you've found a subset of vectors that are all at angle $\ge t$ from each other. Now repeat this, using binary search on $t$ to find the largest $t$ for which you can find a $k$-clique. In this way you might be able to take advantage of methods in the literature for finding $k$-cliques. (For instance, one simple preprocessing step you can do to reduce the size of the graph is: delete any vertex whose degree is less than $k$, and repeat until convergence.)
Context
StackExchange Computer Science Q#76891, answer score: 3
Revisions (0)
No revisions yet.