HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Compute spherical distance matrix from list of geographical coordinates

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
coordinatescomputedistancelistsphericalfrommatrixgeographical

Problem

I've got a list of geographic coordinates ([lat, long]) and want to compute the corresponding matrix of distances. Distance shall be spherical distance in km.

def distance_on_unit_sphere(coord1, coord2):
    lat1 = coord1[0]
    long1 = coord1[1]
    lat2 = coord2[0]
    long2 = coord2[1]
    # Convert latitude and longitude to spherical coordinates in radians.
    degrees_to_radians = math.pi/180.0
    phi1 = (90.0 - lat1)*degrees_to_radians
    phi2 = (90.0 - lat2)*degrees_to_radians
    theta1 = long1*degrees_to_radians
    theta2 = long2*degrees_to_radians
    # Compute spherical distance from spherical coordinates.
    cos = (math.sin(phi1)*math.sin(phi2)*math.cos(theta1 - theta2) + 
           math.cos(phi1)*math.cos(phi2))
    arc = math.acos( cos )
    # Multiply with radius of the earth in km (GRS 80-Ellipsoid)
    distance = arc * 6371.007176 
    return distance

import math
from collections import OrderedDict

    coordinates_dict = {u'YALE UNIVERSITY': [41.3111, -72.9267],
                        u'YONSEI UNIVERSITY': [37.5664, 126.939],
                        u'YORK UNIVERSITY': [43.7731, -79.5036],
                        u'YUAN ZE UNIVERSITY': [24.9697, 121.267],
                        u'ZHEJIANG UNIVERSITY': [30.2636, 120.121],
                        u'ZHONGNAN UNIVERSITY OF ECONOMICS AND LAW': [30.4752, 114.394]}

    distance_dict = {}

    for university in coordinates_dict.keys():
        distance_dict[university] = OrderedDict()
        coord1 = coordinates_dict[university]
        for other_university in coordinates_dict.keys():
            coord2 = coordinates_dict[other_university]
            distance = distance_on_unit_sphere(coord1, coord2)
            distance_dict[university][other_university] = distance


While distance_on_unit_sphere() is certainly already optimal (I got it from johndcook.com), I think the process to compute the matrix could be sped up. Right now I use a nested loop, I believe that's quite inefficient,

Solution

With numpy and its meshgrid functions, you don't have to do any looping at all, although under the hood looping similar to yours is going on (but in C, and thus much faster).

Here's a numpy version of your function:

import numpy as np

def distance_on_sphere_numpy(coordinate_array):
    """
    Compute a distance matrix of the coordinates using a spherical metric.
    :param coordinate_array: numpy.ndarray with shape (n,2); latitude is in 1st col, longitude in 2nd.
    :returns distance_mat: numpy.ndarray with shape (n, n) containing distance in km between coords.
    """
    # Radius of the earth in km (GRS 80-Ellipsoid)
    EARTH_RADIUS = 6371.007176 

    # Unpacking coordinates
    latitudes = coordinate_array[:, 0]
    longitudes = coordinate_array[:, 1]

    # Convert latitude and longitude to spherical coordinates in radians.
    degrees_to_radians = np.pi/180.0
    phi_values = (90.0 - latitudes)*degrees_to_radians
    theta_values = longitudes*degrees_to_radians

    # Expand phi_values and theta_values into grids
    theta_1, theta_2 = np.meshgrid(theta_values, theta_values)
    theta_diff_mat = theta_1 - theta_2

    phi_1, phi_2 = np.meshgrid(phi_values, phi_values)

    # Compute spherical distance from spherical coordinates
    angle = (np.sin(phi_1) * np.sin(phi_2) * np.cos(theta_diff_mat) + 
           np.cos(phi_1) * np.cos(phi_2))
    arc = np.arccos(angle)

    # Multiply by earth's radius to obtain distance in km
    return arc * EARTH_RADIUS


In addition to changing things to be compatible with numpy, there were several other stylistic and efficiency notes:

  • I renamed the function to eliminate the unit word from its name, which was misleading: you are multiplying the Earth's radius, so the result is not distances on a unit sphere but rather on an Earth-sized sphere.



  • I added docstrings to explain the function.



  • Rather than have the "magic number" 6371.007176 in the bottom of the code, I moved it to the top, defined it as a variable with an all-caps name. This make it more apparent that the function relies on this value and is consistent with recommended Python PEP8 style best practices. The units (km) in particular may not be apparent to users of the code.



-
It is obviously inefficient to compute every distance twice, as my version currently does. However, your version also double-counted, which would be difficult to avoid without storing a copy of the keys in your dictionary in a list or tuple, so you could enumerate() them and then make the second loop go only from the index of the outer loop on up.

-
For easy array-based trigonometry, I switched the calls to math.cos() and math.acos() to np.cos() and np.arccos(). After that the only import I needed from math was pi, but as Jaime pointed out that is available from numpy, so I just used that version and eliminated all imports of anything from math.

This function is usable like this:

coordinates_dict = {u'YALE UNIVERSITY': [41.3111, -72.9267],
                    u'YONSEI UNIVERSITY': [37.5664, 126.939],
                    u'YORK UNIVERSITY': [43.7731, -79.5036],
                    u'YUAN ZE UNIVERSITY': [24.9697, 121.267],
                    u'ZHEJIANG UNIVERSITY': [30.2636, 120.121],
                    u'ZHONGNAN UNIVERSITY OF ECONOMICS AND LAW': [30.4752, 114.394]}
​
coordinates_array = np.array([(val[0], val[1]) for key, val in coordinates_dict.iteritems()])
​
distance_on_sphere_numpy(coordinates_array)
Out[2]:
array([[  1.34258937e-04,   1.20828584e+04,   1.16387852e+04,
          1.15437577e+04,   1.05861981e+04,   6.04128344e+02],
       [  1.20828584e+04,   0.00000000e+00,   9.11965962e+02,
          5.99372703e+02,   1.50000489e+03,   1.25010601e+04],
       [  1.16387852e+04,   9.11965962e+02,   0.00000000e+00,
          5.49877343e+02,   1.39741343e+03,   1.19974669e+04],
       [  1.15437577e+04,   5.99372703e+02,   5.49877343e+02,
          0.00000000e+00,   1.02655483e+03,   1.19442021e+04],
       [  1.05861981e+04,   1.50000489e+03,   1.39741343e+03,
          1.02655483e+03,   0.00000000e+00,   1.10150349e+04],
       [  6.04128344e+02,   1.25010601e+04,   1.19974669e+04,
          1.19442021e+04,   1.10150349e+04,   0.00000000e+00]])


By adding your code into its own function, I was able to compare timing between the numpy version and your version:

```
import math
from collections import OrderedDict
import numpy as np

def distance_on_unit_sphere(coord1, coord2):
lat1 = coord1[0]
long1 = coord1[1]
lat2 = coord2[0]
long2 = coord2[1]
# Convert latitude and longitude to spherical coordinates in radians.
degrees_to_radians = math.pi/180.0
phi1 = (90.0 - lat1)*degrees_to_radians
phi2 = (90.0 - lat2)*degrees_to_radians
theta1 = long1*degrees_to_radians
theta2 = long2*degrees_to_radians
# Compute spherical distance from spherical coordinates.
cos = (math.sin(phi1)math.sin(phi2)math.cos(theta1 - theta2) +

Code Snippets

import numpy as np

def distance_on_sphere_numpy(coordinate_array):
    """
    Compute a distance matrix of the coordinates using a spherical metric.
    :param coordinate_array: numpy.ndarray with shape (n,2); latitude is in 1st col, longitude in 2nd.
    :returns distance_mat: numpy.ndarray with shape (n, n) containing distance in km between coords.
    """
    # Radius of the earth in km (GRS 80-Ellipsoid)
    EARTH_RADIUS = 6371.007176 

    # Unpacking coordinates
    latitudes = coordinate_array[:, 0]
    longitudes = coordinate_array[:, 1]

    # Convert latitude and longitude to spherical coordinates in radians.
    degrees_to_radians = np.pi/180.0
    phi_values = (90.0 - latitudes)*degrees_to_radians
    theta_values = longitudes*degrees_to_radians

    # Expand phi_values and theta_values into grids
    theta_1, theta_2 = np.meshgrid(theta_values, theta_values)
    theta_diff_mat = theta_1 - theta_2

    phi_1, phi_2 = np.meshgrid(phi_values, phi_values)

    # Compute spherical distance from spherical coordinates
    angle = (np.sin(phi_1) * np.sin(phi_2) * np.cos(theta_diff_mat) + 
           np.cos(phi_1) * np.cos(phi_2))
    arc = np.arccos(angle)

    # Multiply by earth's radius to obtain distance in km
    return arc * EARTH_RADIUS
coordinates_dict = {u'YALE UNIVERSITY': [41.3111, -72.9267],
                    u'YONSEI UNIVERSITY': [37.5664, 126.939],
                    u'YORK UNIVERSITY': [43.7731, -79.5036],
                    u'YUAN ZE UNIVERSITY': [24.9697, 121.267],
                    u'ZHEJIANG UNIVERSITY': [30.2636, 120.121],
                    u'ZHONGNAN UNIVERSITY OF ECONOMICS AND LAW': [30.4752, 114.394]}
​
coordinates_array = np.array([(val[0], val[1]) for key, val in coordinates_dict.iteritems()])
​
distance_on_sphere_numpy(coordinates_array)
Out[2]:
array([[  1.34258937e-04,   1.20828584e+04,   1.16387852e+04,
          1.15437577e+04,   1.05861981e+04,   6.04128344e+02],
       [  1.20828584e+04,   0.00000000e+00,   9.11965962e+02,
          5.99372703e+02,   1.50000489e+03,   1.25010601e+04],
       [  1.16387852e+04,   9.11965962e+02,   0.00000000e+00,
          5.49877343e+02,   1.39741343e+03,   1.19974669e+04],
       [  1.15437577e+04,   5.99372703e+02,   5.49877343e+02,
          0.00000000e+00,   1.02655483e+03,   1.19442021e+04],
       [  1.05861981e+04,   1.50000489e+03,   1.39741343e+03,
          1.02655483e+03,   0.00000000e+00,   1.10150349e+04],
       [  6.04128344e+02,   1.25010601e+04,   1.19974669e+04,
          1.19442021e+04,   1.10150349e+04,   0.00000000e+00]])
import math
from collections import OrderedDict
import numpy as np

def distance_on_unit_sphere(coord1, coord2):
    lat1 = coord1[0]
    long1 = coord1[1]
    lat2 = coord2[0]
    long2 = coord2[1]
    # Convert latitude and longitude to spherical coordinates in radians.
    degrees_to_radians = math.pi/180.0
    phi1 = (90.0 - lat1)*degrees_to_radians
    phi2 = (90.0 - lat2)*degrees_to_radians
    theta1 = long1*degrees_to_radians
    theta2 = long2*degrees_to_radians
    # Compute spherical distance from spherical coordinates.
    cos = (math.sin(phi1)*math.sin(phi2)*math.cos(theta1 - theta2) + 
           math.cos(phi1)*math.cos(phi2))
    arc = math.acos( cos )
    # Multiply with radius of the earth in km (GRS 80-Ellipsoid)
    distance = arc * 6371.007176 
    return distance

def compute_distance_mat(coordinates_dict):
    distance_dict = {}
    for university in coordinates_dict.keys():
        distance_dict[university] = OrderedDict()
        coord1 = coordinates_dict[university]
        for other_university in coordinates_dict.keys():
            coord2 = coordinates_dict[other_university]
            distance = distance_on_unit_sphere(coord1, coord2)
            distance_dict[university][other_university] = distance
    return distance_dict

coordinates_dict = {u'YALE UNIVERSITY': [41.3111, -72.9267],
                    u'YONSEI UNIVERSITY': [37.5664, 126.939],
                    u'YORK UNIVERSITY': [43.7731, -79.5036],
                    u'YUAN ZE UNIVERSITY': [24.9697, 121.267],
                    u'ZHEJIANG UNIVERSITY': [30.2636, 120.121],
                    u'ZHONGNAN UNIVERSITY OF ECONOMICS AND LAW': [30.4752, 114.394]}
import timeit
%timeit compute_distance_mat(coordinates_dict)
​
%timeit distance_on_sphere_numpy(coordinates_array)
10000 loops, best of 3: 125 µs per loop
10000 loops, best of 3: 40.8 µs per loop

Context

StackExchange Code Review Q#98275, answer score: 5

Revisions (0)

No revisions yet.