patternpythonMinor
Find the nearest point of a given set of points
Viewed 0 times
thepointnearestpointsfindgivenset
Problem
Suppose there are a set of given points (represented by
My current solution is straightforward: just find min among all distances. The issue of my implementation is, if we want to calculate nearest point among the given set of points for another point B, I need to calculate distance again.
My question is, suppose the given set of points are fixed, is there any way to optimize (e.g. pre-process), so that search nearest point is much faster?
x and y two dimensional coordinates), and for any given point A, I want to find the nearest distance point among the given set of point.My current solution is straightforward: just find min among all distances. The issue of my implementation is, if we want to calculate nearest point among the given set of points for another point B, I need to calculate distance again.
My question is, suppose the given set of points are fixed, is there any way to optimize (e.g. pre-process), so that search nearest point is much faster?
import sys
import random
def distance(p1, p2):
return (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
def search_point(points, target_point):
result = sys.maxint
nearest_point = -1
for p in points:
d = distance(p, target_point)
if d < result:
result = d
nearest_point = p
return nearest_point
if __name__ == "__main__":
points = []
for i in range(10):
points.append((random.randint(0,20),random.randint(0,20)))
target_point = (random.randint(0,20), random.randint(0,20))
print 'result', search_point(points, target_point)
print 'target_point', target_point
print 'raw points', points
print 'distances', [distance(p, target_point) for p in points]Solution
It makes no sense to save the distances to disk (disk i/o is slower than the computation), but it might make sense to "memoize" your function:
Note that this might actually be slower than your original function because dictionary lookup may be more expensive than 2 multiplications (which dwarf two subtractions and an addition).
Edit: envelopes
If your pool of points is huge, and you have a stream of point that have to be matched against the pool (i.e., find the pool element which is the closest to input point), you can use lattices/envelopes: suppose your pool points have coordinates from 0 to 1. You can split them into 100 boxes by splitting each coordinate into 10. E.g., the box number 35 will have
Specifically, you need to find the closest point in the box where the point landed, and then compare the distance to that closest point to the distance to the boundary of the box.
If the closest point is closer than the boundary, you are done.
Otherwise you need to check those neighboring boxes that are closer than the closest point (up to 11! if the target point and its closest point are almost in the opposite diagonal nodes). However, this should not happen if the number of points in the box is "large".
A good rule of thumb is that each box should contain as many points as there are boxes, e.g., if you have 10,000 points, there should be 100 boxes.
PS. The further development of this approach results in K-d tree.
distance2_cache = {}
def distance2(p1,p2):
"Compute the distance squared, using cache."
try:
return distance2_cache[(p1,p2)]
except KeyError:
distance2_cache[(p1,p2)] = d2 = (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
return d2Note that this might actually be slower than your original function because dictionary lookup may be more expensive than 2 multiplications (which dwarf two subtractions and an addition).
Edit: envelopes
If your pool of points is huge, and you have a stream of point that have to be matched against the pool (i.e., find the pool element which is the closest to input point), you can use lattices/envelopes: suppose your pool points have coordinates from 0 to 1. You can split them into 100 boxes by splitting each coordinate into 10. E.g., the box number 35 will have
0.2<=x<0.3 and 0.4<=y<0.5. Then for each new point you only need to check "very few" boxes (instead of all 100). Specifically, you need to find the closest point in the box where the point landed, and then compare the distance to that closest point to the distance to the boundary of the box.
If the closest point is closer than the boundary, you are done.
Otherwise you need to check those neighboring boxes that are closer than the closest point (up to 11! if the target point and its closest point are almost in the opposite diagonal nodes). However, this should not happen if the number of points in the box is "large".
A good rule of thumb is that each box should contain as many points as there are boxes, e.g., if you have 10,000 points, there should be 100 boxes.
PS. The further development of this approach results in K-d tree.
Code Snippets
distance2_cache = {}
def distance2(p1,p2):
"Compute the distance squared, using cache."
try:
return distance2_cache[(p1,p2)]
except KeyError:
distance2_cache[(p1,p2)] = d2 = (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
return d2Context
StackExchange Code Review Q#155802, answer score: 6
Revisions (0)
No revisions yet.