patternpythonMajor

Fastest way to iterate over Numpy array

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

numpyarraywayiteratefastestover

Problem

I wrote a function to calculate the gamma coefficient of a clustering. The bottleneck is the comparison of values from dist_withing to dist_between. To speed this up, I tried to adapt and compile it using Cython (I dealt with C only few times). But I don't know, how to rapidly iterate over numpy arrays or if its possible at all to do it faster than

for i in range(len(arr)):
    arr[i]

I thought I could use a pointer to the array data and indeed the code runs in only half of the time, but pointer1[i] and pointer2[j] in cdef unsigned int countlower won't give me the expected values from the arrays. So, how to properly and speedy iterate over an array? And where else can be made improvements, even if in this case it would not make such a difference concerning runtime-speed?

# cython: profile=True
import cython

import numpy as np
cimport numpy as np

from scipy.spatial.distance import squareform

DTYPE = np.float
DTYPEint = np.int

ctypedef np.float_t DTYPE_t
ctypedef np.int_t DTYPEint_t

@cython.profile(False)
cdef unsigned int countlower(np.ndarray[DTYPE_t, ndim=1] vec1,
                             np.ndarray[DTYPE_t, ndim=1] vec2,
                             int n1, int n2):
    # Function output corresponds to np.bincount(v1  vec1.data
    cdef unsigned int* pointer2 =  vec2.data

    for i in range(n1):
        for j in range(n2):
            if pointer1[i] s_plus - s_minus) / n_ if n_ != 0 else 0

Edit1: Passing just the pointers, instead of the arrays to the time-critical function (>99% of time is spent there) made a ~ 10% speed-up. I guess some things just cannot be made faster

```
@cython.profile(False)
@cython.boundscheck(False)
@cython.nonecheck(False)
cdef unsigned int countlower(double v1, double v2, int n1, int n2):
''' Function output corresponds to np.bincount(v1 < v2)[1]'''
''' The upper is not correct. It rather corresponds to
sum([np.bincount(v1[i] < v2)[1] for i in range(len(v1))])'''
cdef unsigned i

Solution

Introduction

This question is difficult because:

-
It's not clear what the function countlower does. It's always a good idea to write a docstring for a function, specifying what it does, what arguments it takes, and what it returns. (And test cases are always appreciated.)

-
It's not clear what the role of the arguments n1 and n2 is. The code in the post only ever passes len(v1) for n1 and len(v2) for n2. So is that a requirement? Or is it sometimes possible to pass in other values?

I am going to assume in what follows that:

-
the specification of the countlower function is

Return the number of pairs i, j such that v1[i] 

-

n1 is always len(v1) and n2 is always len(v2)

;

- 
the Cython details are not essential to the problem, and that it's OK to work in plain Python.

Here's my rewrite of the

countlower

 function. Note the docstring, the doctest, and the simple implementation, which loops over the sequence elements rather than their indices:

def countlower1(v, w):
    """Return the number of pairs i, j such that v[i] >> countlower1(list(range(0, 200, 2)), list(range(40, 140)))
    4500

    """
    return sum(x < y for x in v for y in w)


And here's a 1000-element test case, which I'll use in the rest of this answer to compare the performance of various implementations of this function:

>>> v = np.array(list(range(0, 2000, 2)))
>>> w = np.array(list(range(400, 1400)))
>>> from timeit import timeit
>>> timeit(lambda:countlower1(v, w), number=1)
8.449613849865273


Vectorize



The whole reason for using NumPy is that it enables you to vectorize operations on arrays of fixed-size numeric data types. If you can successfully vectorize an operation, then it executes mostly in C, avoiding the substantial overhead of the Python interpreter.

Whenever you find yourself iterating over the elements of an array, then you're not getting any benefit from NumPy, and this is a sign that it's time to rethink your approach.

So let's vectorize the

countlower function. This is easy using a sparse numpy.meshgrid

:

import numpy as np

def countlower2(v, w):
    """Return the number of pairs i, j such that v[i] >> countlower2(np.arange(0, 2000, 2), np.arange(400, 1400))
    450000

    """
    grid = np.meshgrid(v, w, sparse=True)
    return np.sum(grid[0] < grid[1])


Let's see how fast that is on the 1000-element test case:

>>> timeit(lambda:countlower2(v, w), number=1)
0.005706002004444599


That's about 1500 times faster than

countlower1

.

Improve the algorithm



The vectorized

countlower2

 still takes \$O(n^2)\$ time on arrays of length \$O(n)\$, because it has to compare every pair of elements. Is it possible to do better than that?

Suppose that I start by sorting the first array

v. Then consider an element y from the second array w, and find the point where y would fit into the sorted first array, that is, find i such that v[i - 1]

-
"What is meshgrid?" Please read the documentation for numpy.meshgrid.

I use meshgrid to create a NumPy array grid containing all pairs of elements x, y where x is an element of v and y is an element of w. Then I apply the `

Code Snippets

def countlower1(v, w):
    """Return the number of pairs i, j such that v[i] < w[j].

    >>> countlower1(list(range(0, 200, 2)), list(range(40, 140)))
    4500

    """
    return sum(x < y for x in v for y in w)

>>> v = np.array(list(range(0, 2000, 2)))
>>> w = np.array(list(range(400, 1400)))
>>> from timeit import timeit
>>> timeit(lambda:countlower1(v, w), number=1)
8.449613849865273

import numpy as np

def countlower2(v, w):
    """Return the number of pairs i, j such that v[i] < w[j].

    >>> countlower2(np.arange(0, 2000, 2), np.arange(400, 1400))
    450000

    """
    grid = np.meshgrid(v, w, sparse=True)
    return np.sum(grid[0] < grid[1])

>>> timeit(lambda:countlower2(v, w), number=1)
0.005706002004444599

from bisect import bisect_left

def countlower3(v, w):
    """Return the number of pairs i, j such that v[i] < w[j].

    >>> countlower3(list(range(0, 2000, 2)), list(range(400, 1400)))
    450000

    """
    v = sorted(v)
    return sum(bisect_left(v, y) for y in w)

Context

StackExchange Code Review Q#38580, answer score: 29

Revisions (0)

No revisions yet.