patternpythonModerate
Possible optimizations for calculating squared euclidean distance
Viewed 0 times
euclideandistancecalculatingpossiblesquaredforoptimizations
Problem
I need to do a few hundred million euclidean distance calculations every day in a Python project.
Here is what I started out with:
This is quite fast and I already dropped the sqrt calculation since I need to rank items only (nearest-neighbor search). It is still the bottleneck of the script though. Therefore I have written a C extension, which calculates the distance. The calculation is always done with 128-dimensional vectors.
This is the relevant part of the extension, the complete code for the extension is here.
Now this gives a nice speedup in comparison to the numpy version.
But is there any way to speed this up further (this is my first C extension ever so I assume there is)? With the number of times this function is used every day, every microsecond would actually provide a benefit.
Some of you might suggest porting this completely from Python to another language, unfortunately this is a larger project and not an option.
Here is what I started out with:
#!/usr/bin/python
import numpy as np
def euclidean_dist_square(x, y):
diff = np.array(x) - np.array(y)
return np.dot(diff, diff)This is quite fast and I already dropped the sqrt calculation since I need to rank items only (nearest-neighbor search). It is still the bottleneck of the script though. Therefore I have written a C extension, which calculates the distance. The calculation is always done with 128-dimensional vectors.
#include "euclidean.h"
#include
double euclidean(double x[128], double y[128])
{
double Sum;
for(int i=0;i<128;i++)
{
Sum = Sum + pow((x[i]-y[i]),2.0);
}
return Sum;
}This is the relevant part of the extension, the complete code for the extension is here.
Now this gives a nice speedup in comparison to the numpy version.
But is there any way to speed this up further (this is my first C extension ever so I assume there is)? With the number of times this function is used every day, every microsecond would actually provide a benefit.
Some of you might suggest porting this completely from Python to another language, unfortunately this is a larger project and not an option.
Solution
You can decrease Python's overhead by parsing the
Try changing these portions of your code:
to this (untested) code:
Updates
Without any changes, your code takes ~340ns on my computer. Changing the argument processing and how the return value is created decreases the running time to ~295ns. Replacing
Next, I simplified the conversion of the numpy arrays by checking that the objects passed are numpy arrays. This avoids creating new objects. I also assumed that the data type is a "float64". Extra checks could be added if needed. With these changes, the running time is ~220ns. Code is below:
If I remove the call to
args tuple and creating the return value manually.Try changing these portions of your code:
/* Parse the input tuple */
if (!PyArg_ParseTuple(args, "OO", &x_obj, &y_obj))
return NULL;
>
/* Build the output tuple */
PyObject *ret = Py_BuildValue("d", value);
return ret;to this (untested) code:
/* Parse the input tuple manually */
if (PyTuple_GET_SIZE(args) != 2) {
PyErr_SetString(PyExc_TypeError, "requires 2 arguments");
return NULL;
}
x_obj = PyTuple_GET_ITEM(args, 0);
y_obj = PyTuple_GET_ITEM(args, 1);
>
/* Return double as Python float */
return PyFloat_FromDouble(value);Updates
Without any changes, your code takes ~340ns on my computer. Changing the argument processing and how the return value is created decreases the running time to ~295ns. Replacing
pow() with multiplication didn't change the performance with my compiler (GCC 4.8.2). Changing the exponent to a non-integral value significantly increased the running time so I assume constant integer exponents are optimized by the compiler.Next, I simplified the conversion of the numpy arrays by checking that the objects passed are numpy arrays. This avoids creating new objects. I also assumed that the data type is a "float64". Extra checks could be added if needed. With these changes, the running time is ~220ns. Code is below:
static PyObject *euclidean_euclidean(PyObject *self, PyObject *args)
{
PyObject *x_obj, *y_obj;
if (PyTuple_GET_SIZE(args) != 2) {
PyErr_SetString(PyExc_TypeError, "requires 2 arguments");
return NULL;
}
x_obj = PyTuple_GET_ITEM(args, 0);
y_obj = PyTuple_GET_ITEM(args, 1);
if (!PyArray_Check(x_obj) || !PyArray_Check(y_obj)) {
PyErr_SetString(PyExc_TypeError, "requires array arguments");
return NULL;
}
/* Get pointers to the data as C-types. */
double *x = (double*)PyArray_DATA(x_obj);
double *y = (double*)PyArray_DATA(y_obj);
double value = euclidean(x, y);
if (value < 0.0) {
PyErr_SetString(PyExc_RuntimeError,
"Euclidean returned an impossible value.");
return NULL;
}
/* Build the output tuple */
return PyFloat_FromDouble(value);
}If I remove the call to
euclidean(), the running time is ~75ns. If I remove all the the argument parsing and just return the value 0.0, the running time is ~72ns. I did a few more tests to confirm running times and Python's overhead is consistently ~75ns and the euclidean() function has running time of ~150ns.Code Snippets
/* Parse the input tuple */
if (!PyArg_ParseTuple(args, "OO", &x_obj, &y_obj))
return NULL;
<< code snipped >>
/* Build the output tuple */
PyObject *ret = Py_BuildValue("d", value);
return ret;/* Parse the input tuple manually */
if (PyTuple_GET_SIZE(args) != 2) {
PyErr_SetString(PyExc_TypeError, "requires 2 arguments");
return NULL;
}
x_obj = PyTuple_GET_ITEM(args, 0);
y_obj = PyTuple_GET_ITEM(args, 1);
<< code snipped >>
/* Return double as Python float */
return PyFloat_FromDouble(value);static PyObject *euclidean_euclidean(PyObject *self, PyObject *args)
{
PyObject *x_obj, *y_obj;
if (PyTuple_GET_SIZE(args) != 2) {
PyErr_SetString(PyExc_TypeError, "requires 2 arguments");
return NULL;
}
x_obj = PyTuple_GET_ITEM(args, 0);
y_obj = PyTuple_GET_ITEM(args, 1);
if (!PyArray_Check(x_obj) || !PyArray_Check(y_obj)) {
PyErr_SetString(PyExc_TypeError, "requires array arguments");
return NULL;
}
/* Get pointers to the data as C-types. */
double *x = (double*)PyArray_DATA(x_obj);
double *y = (double*)PyArray_DATA(y_obj);
double value = euclidean(x, y);
if (value < 0.0) {
PyErr_SetString(PyExc_RuntimeError,
"Euclidean returned an impossible value.");
return NULL;
}
/* Build the output tuple */
return PyFloat_FromDouble(value);
}Context
StackExchange Code Review Q#52218, answer score: 14
Revisions (0)
No revisions yet.