patternpythonMajor
Fastest way to iterate over Numpy array
Viewed 0 times
numpyarraywayiteratefastestover
Problem
I wrote a function to calculate the gamma coefficient of a clustering. The bottleneck is the comparison of values from
I thought I could use a pointer to the array data and indeed the code runs in only half of the time, but
Edit1: Passing just the pointers, instead of the arrays to the time-critical function (>99% of time is spent there) made a ~ 10% speed-up. I guess some things just cannot be made faster
```
@cython.profile(False)
@cython.boundscheck(False)
@cython.nonecheck(False)
cdef unsigned int countlower(double v1, double v2, int n1, int n2):
''' Function output corresponds to np.bincount(v1 < v2)[1]'''
''' The upper is not correct. It rather corresponds to
sum([np.bincount(v1[i] < v2)[1] for i in range(len(v1))])'''
cdef unsigned i
dist_withing to dist_between. To speed this up, I tried to adapt and compile it using Cython (I dealt with C only few times). But I don't know, how to rapidly iterate over numpy arrays or if its possible at all to do it faster thanfor i in range(len(arr)):
arr[i]I thought I could use a pointer to the array data and indeed the code runs in only half of the time, but
pointer1[i] and pointer2[j] in cdef unsigned int countlower won't give me the expected values from the arrays. So, how to properly and speedy iterate over an array? And where else can be made improvements, even if in this case it would not make such a difference concerning runtime-speed?# cython: profile=True
import cython
import numpy as np
cimport numpy as np
from scipy.spatial.distance import squareform
DTYPE = np.float
DTYPEint = np.int
ctypedef np.float_t DTYPE_t
ctypedef np.int_t DTYPEint_t
@cython.profile(False)
cdef unsigned int countlower(np.ndarray[DTYPE_t, ndim=1] vec1,
np.ndarray[DTYPE_t, ndim=1] vec2,
int n1, int n2):
# Function output corresponds to np.bincount(v1 vec1.data
cdef unsigned int* pointer2 = vec2.data
for i in range(n1):
for j in range(n2):
if pointer1[i] s_plus - s_minus) / n_ if n_ != 0 else 0Edit1: Passing just the pointers, instead of the arrays to the time-critical function (>99% of time is spent there) made a ~ 10% speed-up. I guess some things just cannot be made faster
```
@cython.profile(False)
@cython.boundscheck(False)
@cython.nonecheck(False)
cdef unsigned int countlower(double v1, double v2, int n1, int n2):
''' Function output corresponds to np.bincount(v1 < v2)[1]'''
''' The upper is not correct. It rather corresponds to
sum([np.bincount(v1[i] < v2)[1] for i in range(len(v1))])'''
cdef unsigned i
Solution
- Introduction
This question is difficult because:
-
It's not clear what the function
countlower does. It's always a good idea to write a docstring for a function, specifying what it does, what arguments it takes, and what it returns. (And test cases are always appreciated.)-
It's not clear what the role of the arguments
n1 and n2 is. The code in the post only ever passes len(v1) for n1 and len(v2) for n2. So is that a requirement? Or is it sometimes possible to pass in other values?I am going to assume in what follows that:
-
the specification of the
countlower function is Return the number of pairs i, j such that v1[i]
-
n1 is always len(v1) and n2 is always len(v2);
-
the Cython details are not essential to the problem, and that it's OK to work in plain Python.
Here's my rewrite of the countlower function. Note the docstring, the doctest, and the simple implementation, which loops over the sequence elements rather than their indices:
def countlower1(v, w):
"""Return the number of pairs i, j such that v[i] >> countlower1(list(range(0, 200, 2)), list(range(40, 140)))
4500
"""
return sum(x < y for x in v for y in w)
And here's a 1000-element test case, which I'll use in the rest of this answer to compare the performance of various implementations of this function:
>>> v = np.array(list(range(0, 2000, 2)))
>>> w = np.array(list(range(400, 1400)))
>>> from timeit import timeit
>>> timeit(lambda:countlower1(v, w), number=1)
8.449613849865273
- Vectorize
The whole reason for using NumPy is that it enables you to vectorize operations on arrays of fixed-size numeric data types. If you can successfully vectorize an operation, then it executes mostly in C, avoiding the substantial overhead of the Python interpreter.
Whenever you find yourself iterating over the elements of an array, then you're not getting any benefit from NumPy, and this is a sign that it's time to rethink your approach.
So let's vectorize the countlower function. This is easy using a sparse numpy.meshgrid:
import numpy as np
def countlower2(v, w):
"""Return the number of pairs i, j such that v[i] >> countlower2(np.arange(0, 2000, 2), np.arange(400, 1400))
450000
"""
grid = np.meshgrid(v, w, sparse=True)
return np.sum(grid[0] < grid[1])
Let's see how fast that is on the 1000-element test case:
>>> timeit(lambda:countlower2(v, w), number=1)
0.005706002004444599
That's about 1500 times faster than countlower1.
- Improve the algorithm
The vectorized countlower2 still takes \$O(n^2)\$ time on arrays of length \$O(n)\$, because it has to compare every pair of elements. Is it possible to do better than that?
Suppose that I start by sorting the first array v. Then consider an element y from the second array w, and find the point where y would fit into the sorted first array, that is, find i such that v[i - 1] -
"What is meshgrid?" Please read the documentation for
numpy.meshgrid.I use
meshgrid to create a NumPy array grid containing all pairs of elements x, y where x is an element of v and y is an element of w. Then I apply the `Code Snippets
def countlower1(v, w):
"""Return the number of pairs i, j such that v[i] < w[j].
>>> countlower1(list(range(0, 200, 2)), list(range(40, 140)))
4500
"""
return sum(x < y for x in v for y in w)>>> v = np.array(list(range(0, 2000, 2)))
>>> w = np.array(list(range(400, 1400)))
>>> from timeit import timeit
>>> timeit(lambda:countlower1(v, w), number=1)
8.449613849865273import numpy as np
def countlower2(v, w):
"""Return the number of pairs i, j such that v[i] < w[j].
>>> countlower2(np.arange(0, 2000, 2), np.arange(400, 1400))
450000
"""
grid = np.meshgrid(v, w, sparse=True)
return np.sum(grid[0] < grid[1])>>> timeit(lambda:countlower2(v, w), number=1)
0.005706002004444599from bisect import bisect_left
def countlower3(v, w):
"""Return the number of pairs i, j such that v[i] < w[j].
>>> countlower3(list(range(0, 2000, 2)), list(range(400, 1400)))
450000
"""
v = sorted(v)
return sum(bisect_left(v, y) for y in w)Context
StackExchange Code Review Q#38580, answer score: 29
Revisions (0)
No revisions yet.