patternpythonModerate

Python / Numpy running 15x slower than MATLAB - am I using Numpy effeciently?

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

15xnumpymatlabthanslowerrunningpythonusingeffeciently

Problem

Here is some code I wrote in Python / Numpy that I pretty much directly translated from MATLAB code. When I run the code in MATLAB on my machine, it takes roughly 17 seconds. When I run the code in Python / Numpy on my machine, it takes roughly 233 seconds. Am I not using Numpy effectively? Please look over my Python code to see if I'm using Numpy in a non effective manner. All this code is doing is fitting the parameter D (diffusion coefficient) in the heat equation to some synthetically generated data using MCMC method.

import numpy as np
from numpy import * 
import pylab as py
from pylab import *
import math
import time

def heat(D,u0,q,tdim):
    xdim = np.size(u0)
    Z = np.zeros([xdim,tdim])
    Z[:,0]=u0;
    for i in range(1,tdim):
        for j in range (1,xdim-1):
            Z[j,i]=Z[j,i-1]+ D*q*(Z[j-1,i-1]-2*Z[j,i-1]+Z[j+1,i-1])
    return Z

start_time = time.clock()
L = 10
D = 0.5

s = 0.03  # magnitude of noise

Tmax = 0.2
xdim = 25
tdim = 75 

x = np.linspace(0,L,xdim)
t = np.linspace(0,Tmax,tdim)

dt = t[1]-t[0]
dx = x[1]-x[0]

q = dt/(dx**2)
r1 = 0.75*L
r2 = 0.8*L

################################################
## check the stability criterion dt/(dx^2)=r1 and x[i]1, mesh(x(xDat),t(tDat),uDat);
#else set(plot3(x(xDat),t(tDat)*ones(1,nxDat),uDat,'r-o'),'LineWidth',3);
#end; hold off; drawnow

#MCMC run
N = 10000
m = 100
XD = 1.0
X = np.zeros(N)
X[0] = XD
Z = heat(XD,u0,q,tdim)
u = Z[xDat,tfinal]
oLLkd = sum(sum(-(u-uDat)**2))/(2*s**2)
LL = np.zeros(N)
LL[0] = oLLkd

# random walk step size
w = 0.1
for n in range (1,N):
    XDp = XD+w*(2*rand(1)-1)
    if XDp > 0:
        Z = heat(XDp,u0,q,tdim)
        u = Z[xDat,tfinal]
        nLLkd = sum(sum( -(u-uDat)**2))/(2*s**2)
        alpha = exp((nLLkd-oLLkd))
        if random() < alpha:     
            XD = XDp
            oLLkd = nLLkd
            CZ = Z
    X[n] = XD;
    LL[n] = oLLkd;

print time.clock() - start_time, "seconds"

Solution

In the heat function, simply vectorizing the inner loop, drops the time from 340 sec to 56 sec, a 6x improvement. It starts by defining the first column of Z, and calculates the next column from that (modeling heat diffusion).

def heat(D,u0,q,tdim):
    xdim = np.size(u0)
    Z = np.zeros([xdim,tdim])
    Z[:,0]=u0;
    for i in range(1,tdim):
        #for j in range (1,xdim-1):
        #    Z[j,i]=Z[j,i-1]+ D*q*(Z[j-1,i-1]-2*Z[j,i-1]+Z[j+1,i-1])
        J = np.arange(1, xdim-1)
        Z[J,i] = Z[J,i-1] + D*q*( Z[J-1,i-1] - 2*Z[J,i-1] + Z[J+1,i-1] )
    return Z

some added improvement (10x speedup) by streamlining the indexing

Z1 = Z[:,i-1]
    Z[j,i] = Z1[1:-1] + D*q* (Z1[:-2] - 2 * Z1[1:-1] + Z1[2:])

Better yet. This drops time to 7sec, a 45x improvement. It constructs a matrix with 3 diagonals, and applies that repeatedly to the u vector (with a dot product).

def heat(D,u0,q,tdim):
    # drops time to 7sec
    N = np.size(u0)
    dq = D*q
    A =  np.eye(N,N,0)+ dq*(np.eye(N,N,-1)+np.eye(N,N,1)-2*np.eye(N,N,0))
    Z = np.zeros([N,tdim])
    Z[:,0] = u0;
    # print u0.shape, A.shape, (A*u0).shape, np.dot(A,u0).shape
    for i in range(1,tdim):
        u0 = np.dot(A,u0)
        Z[:,i] = u0
    return Z

Based on further testing and reading, I think np.dot(A,u0) is using the fast BLAS code.

For larger dimensions (here xdim is only 25), scipy.sparse can be used to make a more compact A matrix. For example, a sparse version of A can be produced with

sp.eye(N,N,0) + D * q * sp.diags([1, -2, 1], [-1, 0, 1], shape=(N, N))

But there isn't a speed advantage at this small size.

Code Snippets

def heat(D,u0,q,tdim):
    xdim = np.size(u0)
    Z = np.zeros([xdim,tdim])
    Z[:,0]=u0;
    for i in range(1,tdim):
        #for j in range (1,xdim-1):
        #    Z[j,i]=Z[j,i-1]+ D*q*(Z[j-1,i-1]-2*Z[j,i-1]+Z[j+1,i-1])
        J = np.arange(1, xdim-1)
        Z[J,i] = Z[J,i-1] + D*q*( Z[J-1,i-1] - 2*Z[J,i-1] + Z[J+1,i-1] )
    return Z

Z1 = Z[:,i-1]
    Z[j,i] = Z1[1:-1] + D*q* (Z1[:-2] - 2 * Z1[1:-1] + Z1[2:])

def heat(D,u0,q,tdim):
    # drops time to 7sec
    N = np.size(u0)
    dq = D*q
    A =  np.eye(N,N,0)+ dq*(np.eye(N,N,-1)+np.eye(N,N,1)-2*np.eye(N,N,0))
    Z = np.zeros([N,tdim])
    Z[:,0] = u0;
    # print u0.shape, A.shape, (A*u0).shape, np.dot(A,u0).shape
    for i in range(1,tdim):
        u0 = np.dot(A,u0)
        Z[:,i] = u0
    return Z

sp.eye(N,N,0) + D * q * sp.diags([1, -2, 1], [-1, 0, 1], shape=(N, N))

Context

StackExchange Code Review Q#17702, answer score: 17

Revisions (0)

No revisions yet.