HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Calculating the distance between one point, and many others

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thepointothersdistancecalculatingbetweenonemanyand

Problem

In my program, I have entities that I call "blobs", because they have a blobby shape. Blobs are polygons. If I have two blobs, then their information array would look like:

>>> blobs
np.array([ [ [x1, y1], [x2, y2], [x3, y3] ], [ [x1, y1], [x2, y2], [x3, y3] ] ])


Each entry along axis 0 represents one blob. Each entry along axis 2 represents the x,y coordinates of the vertex of a blob. In the above case, there are 2 blobs, each with three vertices.

There could be other information in each

We can assume that all blobs will always have the same number of vertices.

I sometimes need to find which blobs are within a certain distance from another blob.

Here are the two functions I use:

def returnBlobIndicesOfRelevantBlobs(minimumDistanceFromFocusVertex, blobs, focusVertexBlobIndex, focusVertexIndex):
    focusVertexCoords = blobs[focusVertexBlobIndex, focusVertexIndex, :2]

    # get the coordinate of the zeroth vertex for all blobs
    vertexCoordsForAllRelevantBlobs = blobs[:, 0, :2]

    distanceVectors = (vertexCoordsForAllRelevantBlobs - focusVertexCoords)
    distances = np.sqrt(np.einsum('ij,ij->i', distanceVectors, distanceVectors))

    mask = (distances i', distanceVectors, distanceVectors))

    mask = (distances <= minimumDistanceFromFocusVertex)
    return relevantBlobVertexIndices[mask]


What improvements could I make?

Solution

Your existing algorithm is a bit perplexing to me. In particular, the returnBlobIndicesOfRelevantBlobs doesn't really do much except cause the answer to occasionally be wrong. For example here's what I'm using to drive your code:

from timeit import default_timer as timer
# your code goes here

blobs = np.array([ 
    [ [0,0],  [0,0], [0,0] ], 
    [ [12,2], [9,0], [0,0] ], 
    [ [-1,1], [0,0], [0,3] ] 
])

start = timer()
r = 2
ans = returnBlobVertexIndicesOfRelevantBlobs(r, blobs, 0, 0)
elapsed_time = (timer() - start)
print "Found {0} in {1} s.".format(len(ans), elapsed_time)


For efficiency, you can eliminate the use the np.sqrt by simply using the square of minimumDistanceFromFocusVertex (my those are long names!) which you would only need to calculate once.

Your use of 4*minimumDistanceFromFocusVertex seems a bit arbitrary and probably incorrect. For this particular example, for instance, it causes the second blob (with first coordinate [12,2]) to be removed from further consideration even though its third coordinate is within the target range.

If you're using Python 2.7 (as I am) you should use xrange instead of range. The difference is that range actually allocates memory and populates an array while range does not. (In Python 3.3, range now does what xrange did in Python 2.7, and xrange is deprecated.)

Finally, while there may be good reason for your application, it's not at all clear to me why one would create an array of indices as the desired final answer. If that's really what you wanted, then that's fine, but if you find that you're having to do some additional transformation before using it, consider ways in which that this code could deliver the results is a more directly usable form.

As a final check, and speed test you can use this:

# first create a pseudo-random population of blobs
np.random.seed(7)
r=10000
n=4000000
blobs = np.random.random_integers(-r, r, (n,3,2))
# set the first vertex at the origin
blobs[0,0] = [0,0]

start = timer()
ans = returnBlobVertexIndicesOfRelevantBlobs(r, blobs, 0, 0)
elapsed_time = (timer() - start)
print "Found {0} in {1} s.".format(len(ans), elapsed_time)
print "The value of pi is approximately {}".format((4.0*len(ans))/(3.0*n))


As a bonus, you can use this to calculate an approximate value of \$\pi\$ which serves as a useful sanity check. That works because the code above places 3 random points per \$n\$ blobs in a square that measures \$2r\$ on each side for an area of \$4r^2\$. The code then counts the number of points that are within \$r\$ units of the center, which of course describes an inscribed circle with area \$\pi r^2\$. Since all of the points are placed with uniform distribution, if \$a\$ is the number of points within the circle, we know that
$$\displaystyle \frac{a}{3n} \approx \frac{\pi r^2}{4 r^2} $$
or in other words, that the odds that each point lands in the circle within the square is proportional to the areas of the circle and square. So,
$$\displaystyle \frac{4a}{3n} \approx \pi$$

When I run the program on my machine with the values shown above, I get:

Found 9424443 in 41.8755390644 s.
The value of pi is approximately 3.141481


I'm sure there are ways to speed up these calculations, which I may have time to pursue later.

Code Snippets

from timeit import default_timer as timer
# your code goes here

blobs = np.array([ 
    [ [0,0],  [0,0], [0,0] ], 
    [ [12,2], [9,0], [0,0] ], 
    [ [-1,1], [0,0], [0,3] ] 
])

start = timer()
r = 2
ans = returnBlobVertexIndicesOfRelevantBlobs(r, blobs, 0, 0)
elapsed_time = (timer() - start)
print "Found {0} in {1} s.".format(len(ans), elapsed_time)
# first create a pseudo-random population of blobs
np.random.seed(7)
r=10000
n=4000000
blobs = np.random.random_integers(-r, r, (n,3,2))
# set the first vertex at the origin
blobs[0,0] = [0,0]

start = timer()
ans = returnBlobVertexIndicesOfRelevantBlobs(r, blobs, 0, 0)
elapsed_time = (timer() - start)
print "Found {0} in {1} s.".format(len(ans), elapsed_time)
print "The value of pi is approximately {}".format((4.0*len(ans))/(3.0*n))
Found 9424443 in 41.8755390644 s.
The value of pi is approximately 3.141481

Context

StackExchange Code Review Q#54510, answer score: 2

Revisions (0)

No revisions yet.