patternpythonMinor

Efficient element-wise function computation in Python

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

computationefficientfunctionwiseelementpython

Problem

I have the following optimization problem:

Given two np.arrays X,Y and a function K I would like to compute as fast as possible the matrix incidence gram_matrix where the (i,j)-th element is computed as K(X[i],Y[j]).

Here is an implementation using nested for-loops, which are acknowledged to be the slowest to solve these kind of problems.

def proxy_kernel(X,Y,K):
    gram_matrix = np.zeros((X.shape[0], Y.shape[0]))
    for i, x in enumerate(X):
        for j, y in enumerate(Y):
            gram_matrix[i, j] = K(x, y)
    return gram_matrix

Solution

You can avoid the nested loops using numpy.meshgrid to build a table of entries in x and y, and numpy.vectorize to apply a function to all entries in the table:

def tabulate(x, y, f):
    """Return a table of f(x, y)."""
    return np.vectorize(f)(*np.meshgrid(x, y, sparse=True))

For example:

>>> import operator
>>> tabulate(np.arange(1, 5), np.arange(1, 4), operator.mul)
array([[ 1,  2,  3,  4],
       [ 2,  4,  6,  8],
       [ 3,  6,  9, 12]])

This is about twice as fast as proxy_kernel:

>>> from timeit import timeit
>>> X = Y = np.arange(2000)
>>> timeit(lambda:proxy_kernel(X, Y, operator.mul), number=1)
2.174600816098973
>>> timeit(lambda:tabulate(X, Y, operator.mul), number=1)
0.9889162541367114

but it still has to evaluate the slow Python code in f for each entry in the table. To really speed things up you need to use whole-array operations throughout. Depending on how you've written f, it may be possible to do this directly. For example, if you have something like:

def f(x, y):
    return np.exp(-0.5 * np.abs(x - y))

then this is already suitable for applying to whole arrays:

>>> f(*np.meshgrid(np.arange(4), np.arange(4), sparse=True))
array([[ 1.        ,  0.60653066,  0.36787944,  0.22313016],
       [ 0.60653066,  1.        ,  0.60653066,  0.36787944],
       [ 0.36787944,  0.60653066,  1.        ,  0.60653066],
       [ 0.22313016,  0.36787944,  0.60653066,  1.        ]])

and this is about five times as fast as using numpy.vectorize:

>>> timeit(lambda:f(*np.meshgrid(X, Y, sparse=True)), number=1)
0.1973482659086585

Code Snippets

def tabulate(x, y, f):
    """Return a table of f(x, y)."""
    return np.vectorize(f)(*np.meshgrid(x, y, sparse=True))

>>> import operator
>>> tabulate(np.arange(1, 5), np.arange(1, 4), operator.mul)
array([[ 1,  2,  3,  4],
       [ 2,  4,  6,  8],
       [ 3,  6,  9, 12]])

>>> from timeit import timeit
>>> X = Y = np.arange(2000)
>>> timeit(lambda:proxy_kernel(X, Y, operator.mul), number=1)
2.174600816098973
>>> timeit(lambda:tabulate(X, Y, operator.mul), number=1)
0.9889162541367114

def f(x, y):
    return np.exp(-0.5 * np.abs(x - y))

>>> f(*np.meshgrid(np.arange(4), np.arange(4), sparse=True))
array([[ 1.        ,  0.60653066,  0.36787944,  0.22313016],
       [ 0.60653066,  1.        ,  0.60653066,  0.36787944],
       [ 0.36787944,  0.60653066,  1.        ,  0.60653066],
       [ 0.22313016,  0.36787944,  0.60653066,  1.        ]])

Context

StackExchange Code Review Q#90005, answer score: 4

Revisions (0)

No revisions yet.