patternpythonMinor

Improve performance of Gaussian kernel function evaluation

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

evaluationfunctionimproveperformancekernelgaussian

Problem

I need to improve the performance of a function that calculates the integral of a two-dimensional kernel density estimate (obtained using the function stats.gaussian_kde) where the domain of integration is all points that evaluate below a given value.

I've found that a single line takes up pretty much 100% of the time used by the code:

insample = kernel(sample) < iso

(see the code below). This line stores in the list insample all the values in sample that evaluate in kernel to less than the float iso.

I made this question (slightly modified) 6 months ago over at Stack Overflow and the best answer involved parallelization. I'm hoping to avoid parallelization (gives way too much trouble as can be seen in the question linked above), but I'm open to any improvement in performance achieved by means of virtually any other way (including Cython and any package available).

Here's the minimum working example (MWE):

```
import numpy as np
from scipy import stats

# Define KDE integration function.
def kde_integration(m1, m2):

# Perform a kernel density estimate (KDE) on the data.
values = np.vstack([m1, m2])
kernel = stats.gaussian_kde(values, bw_method=None)

# This list will be returned at the end of this function.
out_list = []

# Iterate through all floats in m1, m2 lists and calculate for each one the
# integral of the KDE for the domain of points located below the KDE
# value of said float eveluated in the KDE.
for indx, m1_p in enumerate(m1):

# Compute the point below which to integrate.
iso = kernel((m1_p, m2[indx]))

# Sample KDE distribution
sample = kernel.resample(size=100)

# THIS TAKES ALMOST 100% OF THE COMPUTATION TIME.
# Filter the sample.
insample = kernel(sample) 0. else 0.000001

# Append integral value for this point to list that will return.
out_list.append(round(integral, 2))

return out_list

# Generate some random

Solution

I am never forget the day I first meet the great Lobachevsky.
In one word he told me secret of success in NumPy:
Vectorize!

Vectorize!

In every loop inspect your i's

Remember why the good Lord made your eyes!

So don't shade your eyes,

But vectorize, vectorize, vectorize—

Only be sure always to call it please 'refactoring'.

def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
    values = np.vstack((m1, m2))
    kernel = stats.gaussian_kde(values, bw_method=None)
    iso = kernel(values)
    sample = kernel.resample(size=sample_size * len(m1))
    insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
    return np.maximum(insample.mean(axis=0), epsilon)

Update

I find that my kde_integration is about ten times as fast as yours with sample_size=100:

>>> m1, m2 = measure(100)
>>> from timeit import timeit
>>> timeit(lambda:kde_integration1(m1, m2), number=1) # yours
0.7005664870084729
>>> timeit(lambda:kde_integration2(m1, m2), number=1) # mine (above)
0.07272820601065177

But the advantage disappears with larger sample sizes:

>>> timeit(lambda:kde_integration1(m1, m2, sample_size=1024), number=1)
1.1872510590037564
>>> timeit(lambda:kde_integration2(m1, m2, sample_size=1024), number=1)
1.2788789629994426

What this suggests is that there is a "sweet spot" for sample_size * len(m1) (perhaps related to the computer's cache size) and so you'll get the best results by processing the samples in chunks of that length.

You don't say exactly how you're testing it, but different results are surely to be expected, since scipy.stats.gaussian_kde.resample "Randomly samples a dataset from the estimated pdf".

Code Snippets

def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
    values = np.vstack((m1, m2))
    kernel = stats.gaussian_kde(values, bw_method=None)
    iso = kernel(values)
    sample = kernel.resample(size=sample_size * len(m1))
    insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
    return np.maximum(insample.mean(axis=0), epsilon)

Context

StackExchange Code Review Q#42876, answer score: 6

Revisions (0)

No revisions yet.