patternpythonMinor

Using the SSIM method for large images

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

themethodlargeforusingimagesssim

Problem

I'm try to implement the SSIM method. This method has already been implemented in Python (source code), but my goal is implement it with using only Python and NumPy.

My goal is also to use this method on big images (1024x1024 and above). But filter2 works very slow (approx. 62 sec. for 1024x1024). cProfile gives me information that _methods.py:16(_sum), fromnumeric.py:1422(sum), and method 'reduce' of 'numpy.ufunc' objects eat main part time of run.

```
import numpy

def filter2(window, x):
range1 = x.shape[0] - window.shape[0] + 1
range2 = x.shape[1] - window.shape[1] + 1
res = numpy.zeros((range1, range2), dtype=numpy.double)
x1 = as_strided(x,((x.shape[0] - 10)/1 ,(x.shape[1] - 10)/1 ,11,11), (x.strides[0]1,x.strides[1]1,x.strides[0],x.strides[1])) * window
for i in xrange(range1):
for j in xrange(range2):
res[i,j] = x1[i,j].sum()
return res

def ssim(img1, img2):
window = numpy.array([\
[0.0000, 0.0000, 0.0000, 0.0001, 0.0002, 0.0003, 0.0002, 0.0001, 0.0000, 0.0000, 0.0000],\
[0.0000, 0.0001, 0.0003, 0.0008, 0.0016, 0.0020, 0.0016, 0.0008, 0.0003, 0.0001, 0.0000],\
[0.0000, 0.0003, 0.0013, 0.0039, 0.0077, 0.0096, 0.0077, 0.0039, 0.0013, 0.0003, 0.0000],\
[0.0001, 0.0008, 0.0039, 0.0120, 0.0233, 0.0291, 0.0233, 0.0120, 0.0039, 0.0008, 0.0001],\
[0.0002, 0.0016, 0.0077, 0.0233, 0.0454, 0.0567, 0.0454, 0.0233, 0.0077, 0.0016, 0.0002],\
[0.0003, 0.0020, 0.0096, 0.0291, 0.0567, 0.0708, 0.0567, 0.0291, 0.0096, 0.0020, 0.0003],\
[0.0002, 0.0016, 0.0077, 0.0233, 0.0454, 0.0567, 0.0454, 0.0233, 0.0077, 0.0016, 0.0002],\
[0.0001, 0.0008, 0.0039, 0.0120, 0.0233, 0.0291, 0.0233, 0.0120, 0.0039, 0.0008, 0.0001],\
[0.0000, 0.0003, 0.0013, 0.0039, 0.0077, 0.0096, 0.0077, 0.0039, 0.0013, 0.0003, 0.0000],\
[0.0000, 0.0001, 0.0003, 0.0008, 0.0016, 0.0020, 0.0016, 0.0008, 0.0003, 0.0001, 0.0000],\
[0.0000, 0.0000, 0.0000, 0.0001, 0.0002, 0

Solution

In your strided filter2, x1 is (1014, 1014, 11, 11). You are iterating over the 1st 2 dimensions in order to sum on on the last 2. Let sum do all the work for you, res = x1.sum((2,3))

def filter2(window, x):
    range1 = x.shape[0] - window.shape[0] + 1
    range2 = x.shape[1] - window.shape[1] + 1
    x1 = as_strided(x,((x.shape[0] - 10)/1 ,(x.shape[1] - 10)/1 ,11,11), (x.strides[0]*1,x.strides[1]*1,x.strides[0],x.strides[1])) * window
    res = x1.sum((2,3))
    return res

In my tests this gives a 6x speed improvement.

With numpy iteration, especially nested ones over large dimensions like 1014 is a speed killer. You want to vectorize this kind of thing as much as possible.

Traditionally Matlab had the same speed problems, but newer versions recognize and compile loops like yours. That's why your numpy is so much slower.

Code Snippets

def filter2(window, x):
    range1 = x.shape[0] - window.shape[0] + 1
    range2 = x.shape[1] - window.shape[1] + 1
    x1 = as_strided(x,((x.shape[0] - 10)/1 ,(x.shape[1] - 10)/1 ,11,11), (x.strides[0]*1,x.strides[1]*1,x.strides[0],x.strides[1])) * window
    res = x1.sum((2,3))
    return res

Context

StackExchange Code Review Q#31089, answer score: 2

Revisions (0)

No revisions yet.