patternpythonMinor
Calculating the distance between one point, and many others
Viewed 0 times
thepointothersdistancecalculatingbetweenonemanyand
Problem
In my program, I have entities that I call "blobs", because they have a blobby shape. Blobs are polygons. If I have two blobs, then their information array would look like:
Each entry along axis 0 represents one blob. Each entry along axis 2 represents the x,y coordinates of the vertex of a blob. In the above case, there are 2 blobs, each with three vertices.
There could be other information in each
We can assume that all blobs will always have the same number of vertices.
I sometimes need to find which blobs are within a certain distance from another blob.
Here are the two functions I use:
What improvements could I make?
>>> blobs
np.array([ [ [x1, y1], [x2, y2], [x3, y3] ], [ [x1, y1], [x2, y2], [x3, y3] ] ])Each entry along axis 0 represents one blob. Each entry along axis 2 represents the x,y coordinates of the vertex of a blob. In the above case, there are 2 blobs, each with three vertices.
There could be other information in each
We can assume that all blobs will always have the same number of vertices.
I sometimes need to find which blobs are within a certain distance from another blob.
Here are the two functions I use:
def returnBlobIndicesOfRelevantBlobs(minimumDistanceFromFocusVertex, blobs, focusVertexBlobIndex, focusVertexIndex):
focusVertexCoords = blobs[focusVertexBlobIndex, focusVertexIndex, :2]
# get the coordinate of the zeroth vertex for all blobs
vertexCoordsForAllRelevantBlobs = blobs[:, 0, :2]
distanceVectors = (vertexCoordsForAllRelevantBlobs - focusVertexCoords)
distances = np.sqrt(np.einsum('ij,ij->i', distanceVectors, distanceVectors))
mask = (distances i', distanceVectors, distanceVectors))
mask = (distances <= minimumDistanceFromFocusVertex)
return relevantBlobVertexIndices[mask]What improvements could I make?
Solution
Your existing algorithm is a bit perplexing to me. In particular, the
For efficiency, you can eliminate the use the
Your use of
If you're using Python 2.7 (as I am) you should use
Finally, while there may be good reason for your application, it's not at all clear to me why one would create an array of indices as the desired final answer. If that's really what you wanted, then that's fine, but if you find that you're having to do some additional transformation before using it, consider ways in which that this code could deliver the results is a more directly usable form.
As a final check, and speed test you can use this:
As a bonus, you can use this to calculate an approximate value of \$\pi\$ which serves as a useful sanity check. That works because the code above places 3 random points per \$n\$ blobs in a square that measures \$2r\$ on each side for an area of \$4r^2\$. The code then counts the number of points that are within \$r\$ units of the center, which of course describes an inscribed circle with area \$\pi r^2\$. Since all of the points are placed with uniform distribution, if \$a\$ is the number of points within the circle, we know that
$$\displaystyle \frac{a}{3n} \approx \frac{\pi r^2}{4 r^2} $$
or in other words, that the odds that each point lands in the circle within the square is proportional to the areas of the circle and square. So,
$$\displaystyle \frac{4a}{3n} \approx \pi$$
When I run the program on my machine with the values shown above, I get:
I'm sure there are ways to speed up these calculations, which I may have time to pursue later.
returnBlobIndicesOfRelevantBlobs doesn't really do much except cause the answer to occasionally be wrong. For example here's what I'm using to drive your code:from timeit import default_timer as timer
# your code goes here
blobs = np.array([
[ [0,0], [0,0], [0,0] ],
[ [12,2], [9,0], [0,0] ],
[ [-1,1], [0,0], [0,3] ]
])
start = timer()
r = 2
ans = returnBlobVertexIndicesOfRelevantBlobs(r, blobs, 0, 0)
elapsed_time = (timer() - start)
print "Found {0} in {1} s.".format(len(ans), elapsed_time)For efficiency, you can eliminate the use the
np.sqrt by simply using the square of minimumDistanceFromFocusVertex (my those are long names!) which you would only need to calculate once.Your use of
4*minimumDistanceFromFocusVertex seems a bit arbitrary and probably incorrect. For this particular example, for instance, it causes the second blob (with first coordinate [12,2]) to be removed from further consideration even though its third coordinate is within the target range.If you're using Python 2.7 (as I am) you should use
xrange instead of range. The difference is that range actually allocates memory and populates an array while range does not. (In Python 3.3, range now does what xrange did in Python 2.7, and xrange is deprecated.)Finally, while there may be good reason for your application, it's not at all clear to me why one would create an array of indices as the desired final answer. If that's really what you wanted, then that's fine, but if you find that you're having to do some additional transformation before using it, consider ways in which that this code could deliver the results is a more directly usable form.
As a final check, and speed test you can use this:
# first create a pseudo-random population of blobs
np.random.seed(7)
r=10000
n=4000000
blobs = np.random.random_integers(-r, r, (n,3,2))
# set the first vertex at the origin
blobs[0,0] = [0,0]
start = timer()
ans = returnBlobVertexIndicesOfRelevantBlobs(r, blobs, 0, 0)
elapsed_time = (timer() - start)
print "Found {0} in {1} s.".format(len(ans), elapsed_time)
print "The value of pi is approximately {}".format((4.0*len(ans))/(3.0*n))As a bonus, you can use this to calculate an approximate value of \$\pi\$ which serves as a useful sanity check. That works because the code above places 3 random points per \$n\$ blobs in a square that measures \$2r\$ on each side for an area of \$4r^2\$. The code then counts the number of points that are within \$r\$ units of the center, which of course describes an inscribed circle with area \$\pi r^2\$. Since all of the points are placed with uniform distribution, if \$a\$ is the number of points within the circle, we know that
$$\displaystyle \frac{a}{3n} \approx \frac{\pi r^2}{4 r^2} $$
or in other words, that the odds that each point lands in the circle within the square is proportional to the areas of the circle and square. So,
$$\displaystyle \frac{4a}{3n} \approx \pi$$
When I run the program on my machine with the values shown above, I get:
Found 9424443 in 41.8755390644 s.
The value of pi is approximately 3.141481I'm sure there are ways to speed up these calculations, which I may have time to pursue later.
Code Snippets
from timeit import default_timer as timer
# your code goes here
blobs = np.array([
[ [0,0], [0,0], [0,0] ],
[ [12,2], [9,0], [0,0] ],
[ [-1,1], [0,0], [0,3] ]
])
start = timer()
r = 2
ans = returnBlobVertexIndicesOfRelevantBlobs(r, blobs, 0, 0)
elapsed_time = (timer() - start)
print "Found {0} in {1} s.".format(len(ans), elapsed_time)# first create a pseudo-random population of blobs
np.random.seed(7)
r=10000
n=4000000
blobs = np.random.random_integers(-r, r, (n,3,2))
# set the first vertex at the origin
blobs[0,0] = [0,0]
start = timer()
ans = returnBlobVertexIndicesOfRelevantBlobs(r, blobs, 0, 0)
elapsed_time = (timer() - start)
print "Found {0} in {1} s.".format(len(ans), elapsed_time)
print "The value of pi is approximately {}".format((4.0*len(ans))/(3.0*n))Found 9424443 in 41.8755390644 s.
The value of pi is approximately 3.141481Context
StackExchange Code Review Q#54510, answer score: 2
Revisions (0)
No revisions yet.