patternpythonMinor
Improve performance of Gaussian kernel function evaluation
Viewed 0 times
evaluationfunctionimproveperformancekernelgaussian
Problem
I need to improve the performance of a function that calculates the integral of a two-dimensional kernel density estimate (obtained using the function stats.gaussian_kde) where the domain of integration is all points that evaluate below a given value.
I've found that a single line takes up pretty much 100% of the time used by the code:
(see the code below). This line stores in the list
I made this question (slightly modified) 6 months ago over at Stack Overflow and the best answer involved parallelization. I'm hoping to avoid parallelization (gives way too much trouble as can be seen in the question linked above), but I'm open to any improvement in performance achieved by means of virtually any other way (including Cython and any package available).
Here's the minimum working example (MWE):
```
import numpy as np
from scipy import stats
# Define KDE integration function.
def kde_integration(m1, m2):
# Perform a kernel density estimate (KDE) on the data.
values = np.vstack([m1, m2])
kernel = stats.gaussian_kde(values, bw_method=None)
# This list will be returned at the end of this function.
out_list = []
# Iterate through all floats in m1, m2 lists and calculate for each one the
# integral of the KDE for the domain of points located below the KDE
# value of said float eveluated in the KDE.
for indx, m1_p in enumerate(m1):
# Compute the point below which to integrate.
iso = kernel((m1_p, m2[indx]))
# Sample KDE distribution
sample = kernel.resample(size=100)
# THIS TAKES ALMOST 100% OF THE COMPUTATION TIME.
# Filter the sample.
insample = kernel(sample) 0. else 0.000001
# Append integral value for this point to list that will return.
out_list.append(round(integral, 2))
return out_list
# Generate some random
I've found that a single line takes up pretty much 100% of the time used by the code:
insample = kernel(sample) < iso(see the code below). This line stores in the list
insample all the values in sample that evaluate in kernel to less than the float iso.I made this question (slightly modified) 6 months ago over at Stack Overflow and the best answer involved parallelization. I'm hoping to avoid parallelization (gives way too much trouble as can be seen in the question linked above), but I'm open to any improvement in performance achieved by means of virtually any other way (including Cython and any package available).
Here's the minimum working example (MWE):
```
import numpy as np
from scipy import stats
# Define KDE integration function.
def kde_integration(m1, m2):
# Perform a kernel density estimate (KDE) on the data.
values = np.vstack([m1, m2])
kernel = stats.gaussian_kde(values, bw_method=None)
# This list will be returned at the end of this function.
out_list = []
# Iterate through all floats in m1, m2 lists and calculate for each one the
# integral of the KDE for the domain of points located below the KDE
# value of said float eveluated in the KDE.
for indx, m1_p in enumerate(m1):
# Compute the point below which to integrate.
iso = kernel((m1_p, m2[indx]))
# Sample KDE distribution
sample = kernel.resample(size=100)
# THIS TAKES ALMOST 100% OF THE COMPUTATION TIME.
# Filter the sample.
insample = kernel(sample) 0. else 0.000001
# Append integral value for this point to list that will return.
out_list.append(round(integral, 2))
return out_list
# Generate some random
Solution
I am never forget the day I first meet the great Lobachevsky.
In one word he told me secret of success in NumPy:
Vectorize!
Vectorize!
In every loop inspect your
Remember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.
Update
I find that my
But the advantage disappears with larger sample sizes:
What this suggests is that there is a "sweet spot" for
You don't say exactly how you're testing it, but different results are surely to be expected, since
In one word he told me secret of success in NumPy:
Vectorize!
Vectorize!
In every loop inspect your
i'sRemember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.
def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
values = np.vstack((m1, m2))
kernel = stats.gaussian_kde(values, bw_method=None)
iso = kernel(values)
sample = kernel.resample(size=sample_size * len(m1))
insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
return np.maximum(insample.mean(axis=0), epsilon)Update
I find that my
kde_integration is about ten times as fast as yours with sample_size=100:>>> m1, m2 = measure(100)
>>> from timeit import timeit
>>> timeit(lambda:kde_integration1(m1, m2), number=1) # yours
0.7005664870084729
>>> timeit(lambda:kde_integration2(m1, m2), number=1) # mine (above)
0.07272820601065177
But the advantage disappears with larger sample sizes:
>>> timeit(lambda:kde_integration1(m1, m2, sample_size=1024), number=1)
1.1872510590037564
>>> timeit(lambda:kde_integration2(m1, m2, sample_size=1024), number=1)
1.2788789629994426
What this suggests is that there is a "sweet spot" for
sample_size * len(m1) (perhaps related to the computer's cache size) and so you'll get the best results by processing the samples in chunks of that length.You don't say exactly how you're testing it, but different results are surely to be expected, since
scipy.stats.gaussian_kde.resample "Randomly samples a dataset from the estimated pdf".Code Snippets
def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
values = np.vstack((m1, m2))
kernel = stats.gaussian_kde(values, bw_method=None)
iso = kernel(values)
sample = kernel.resample(size=sample_size * len(m1))
insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
return np.maximum(insample.mean(axis=0), epsilon)Context
StackExchange Code Review Q#42876, answer score: 6
Revisions (0)
No revisions yet.