debugMinor
Cannot find paper: All k nearest neighbors search in N*log(N) using distance indices for log(N) support points
Viewed 0 times
cannotindicespapersearchalllognearestpointsdistanceusing
Problem
I remember reading a paper about finding k nearest neighbors for all N multi-dimensional objects in the set.
I've tries to find it again many times, but have failed so far.
The algorithm was as follows:
So, why do I need the paper again? First of all, several people have asked me for it and I was surprised I cannot find it again.
More importantly, it did not work well for me. This might be caused by a bad choice of the support objects. I remember nothing about the selection process.
I've tries to find it again many times, but have failed so far.
The algorithm was as follows:
- Each object is assigned a sorted queue of nears neighbors (maximum size is k). Let's assume that $k
- Each time you calculate $distance(A, B)$ try to add A into B's neighbor list and vise versa.
- Choose $log(N)$ support objects (say, first $log(N)$ objects out of $N$). (Or maybe that was $sqrt(N)$)
- For each support, build the distance-based index containing all N objects
- After the index construction, every object has a candidate list of k nearest neighbors. All real nearest neighbors are not further than the furthest neighbor in the candidate list - candidate neighbor radius.
- For each point we use triangle inequality and current neighbor radius to constrain the distances between a candidate point and the support points: $|dist(candidate_i, support_j) - dist(point, support_j)|
- We calculate the distances for those candidates and update the nearest-neighbor lists for both the current point and the candidate points.
- As we process more points, the neighbor lists become "tighter" - the radius, the distance to the furthest neighbor decreases with each update of the list.
- Tighter constraints mean that the candidate lists become smaller and smaller after each processed point and leading to total run time of $N*log(N)$.
So, why do I need the paper again? First of all, several people have asked me for it and I was surprised I cannot find it again.
More importantly, it did not work well for me. This might be caused by a bad choice of the support objects. I remember nothing about the selection process.
Solution
Found the paper finally. Not sure I've read exactly the same paper (maybe I read some SO or SE answer or some other paper that inspired/was inspired by this paper). Nethertheless, the algorithm is exactly the same.
A Fast Algorithm for the All k Nearest Neighbors Problem in General Metric Spaces by Edgar Chávez and Karina Figueroa and Gonzalo Navarro. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.9299 http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=A580FE7BF3251BC23DB793E3B1DFDF64?doi=10.1.1.21.9299&rep=rep1&type=pdf
A Fast Algorithm for the All k Nearest Neighbors Problem in General Metric Spaces by Edgar Chávez and Karina Figueroa and Gonzalo Navarro. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.9299 http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=A580FE7BF3251BC23DB793E3B1DFDF64?doi=10.1.1.21.9299&rep=rep1&type=pdf
Context
StackExchange Computer Science Q#83021, answer score: 2
Revisions (0)
No revisions yet.