patterncMinor

Mean of the squares of differences for a large matrix

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

matrix codereview matlab c stackoverflow performance ffi

themeansquaresdifferenceslargeformatrix

Problem

I have tried to obtain speed gains by transferring Matlab calculations to C and call it using .mex. My goal is to operate on a matrix \$B\$ which has roughly the dimension 10000x1000

$$
\frac{1}{\text{size(B,1)}}\sum_{i=1}^{2+l*t} \text{mean}( (B(:,i)-B(:,i+1))^2)$$

for \$ l*t

A=randn(10000,1000);

tic  
RVV_Da_path=ones(size(A,2)-1,1);
i=2;
j=1;
while i 

My C code goes as follows:

#include
#include "mex.h"

void mexFunction(int nlhs, double plhs[], int nrhs, const mxArray prhs[])
{

// calculate 1/size(A,1)sum_{i=1}^{2+lt} mean(A(:,i)-A(:,i+1))^2, for l*t

Now when I translate the C code to a .mex file and compare the speed of the Matlab and the C code the C code is roughly 4-8 times slower for big \$ B\$'s. Why is this the case? Where is the bottleneck in my C-code and how can I work around it?

Solution

The overall Matlab algorithm can be simplified. The key is to realize that:

Inter-column differences never change

You can use cumulative sums on the means of each column

First let us remove the variable step factor. So tstep = 1. I will reincorporate the variable step factor later.

Let B be a \$m \times n\$ matrix. B(:,1:i) yields a \$m \times i\$ submatrix of B.

Let D = diff(B(:,1:i),1,2). This yields the column differences of B(:,1:i).

The first column of D, aka D(:,1), is equal to the second column of B minus the first column of B, aka B(:,2) - B(:,1)

The second column of D, aka D(:,2) is equal to the third column of B minus the second column of B, aka B(:,3) - B(:,2)

The last column of D, aka D(:,i-1) is equal to the \$i^{th}\$ column of B minus the \$(i-1)^{th}\$ column of B, aka B(:,i) - B(:,i-1)

You'll notice that if i increases by 1, these column differences remain the same. The only thing that changes is that we obtain one additional column difference in D. Hence you do not need to perform the diff() function multiple times. Simply perform it once on the entire \$m \times n\$ matrix to yield a \$m \times (n-1)\$ matrix of column differences and take submatrices of this difference matrix.

So the original code you posted:

B=(20*randn(10000,1000));
RV_B=ones(size(B,2),1);
t_step = 1;

i=2;
j=1;

while i <= size(B,2)+1-t_step
    RV_B(j)=    sum(mean(diff((B(:,1:i)),1,2).^2,1));   
    i=i+t_step;
    j=j+1;
end

RV_B(j)=    sum(mean(diff((B(:,1:end)),1,2).^2,1));
M_1=RV_B';

Can be simplified to:

RV_B=ones(size(B,2),1);
t_step = 1;
D = diff(B,1,2);

i=2;
j=1;

while i <= size(B,2)+1-t_step
    RV_B(j)=    sum(mean(D(:,1:i-1).^2,1));   
    i=i+t_step;
    j=j+1;
end

RV_B(j)=    sum(mean(D(:,1:end).^2,1));
M_1=RV_B';

The next step is to realize that if D is our \$m \times (n-1)\$ difference matrix, mean(D) yields a \$1 \times (n-1)\$ matrix of column means. If we truncate columns of this matrix as we do when we take submatrix D(:,1:i-1) it does not affect the means of the columns that are still there. So instead of performing mean() over and over on submatrices, we can again pull it out the loop, perform mean() once on the entire matrix D, and take submatrices of this mean matrix.

So now the code could look like this:

RV_B=ones(size(B,2),1);

t_step = 1;
D = diff(B,1,2);
MD = mean(D.^2);

i=2;
j=1;

while i <= size(B,2)+1-t_step
    RV_B(j)=    sum(MD(:, 1:i-1));   
    i=i+t_step;
    j=j+1;
end

RV_B(j)=    sum(MD(:, 1:end));
M_1 = RV_B';

Similarly, instead of performing a sum on the rows of the submatrices, we can perform a cumulative sum along the rows (or rather row): $$\begin{align}CSMD[i][j] &= \sum_{k=1}^{j} MD[i][k] \\
&= CSMD[i][j-1] + MD[i][j]\end{align}\\$$

Incorporating Matlab's built-in cumsum() along with changing some intermediary variable names to better illustrate their purpose yields the following code:

diffB = diff(B,1,2);
meanSquaredDiff = mean(diffB.^2);
M_1 = cumsum(meanSquaredDiff, 2);

Now we no longer have any explicit loops (in the matlab code at least). This algorithm is much faster and simpler. Consequently, it is much easier to port to C.

You will find that for tstep=1, M_1 for your algorithm has dimension size(B,2) and for the simplified code I provided it has dimension size(B,2)-1. However, in your original algorithm, M_1(end) and M_1(end-1) are always the same if tstep=1, hence you were performing one calculation too many.

Now let us reincorporate the variable step factor. Realize that if we increase tstep it is equivalent to taking every \$\textrm{tstep}^{th}\$ output after the cumulative sum. So if you want tstep=5 you perform the same three-line code above but at the end sample it via M_1 = M_1(1:5:end) or in general M_1 = M_1(1:step:end). You can use isequal to compare the outputs of your original approach to this approach. Remember that you one-padded your output so if you use \$\textrm{tstep} \gt 1\$, you need to remove the extra padding when comparing the matrices.

Finally the last index of the non-padded output will differ as well (if \$\textrm{tstep} \gt 1\$) since your computation computes the last index differently than the rest of the indices. You can fix this by performing tmp = M_1(end); M_1 = M_1(1:step:end); M_1(end) = tmp; instead of simply M_1 = M_1(1:step:end) however the latter approach is more consistent in my opinion.

Code Snippets

B=(20*randn(10000,1000));
RV_B=ones(size(B,2),1);
t_step = 1;

i=2;
j=1;

while i <= size(B,2)+1-t_step
    RV_B(j)=    sum(mean(diff((B(:,1:i)),1,2).^2,1));   
    i=i+t_step;
    j=j+1;
end

RV_B(j)=    sum(mean(diff((B(:,1:end)),1,2).^2,1));
M_1=RV_B';

RV_B=ones(size(B,2),1);
t_step = 1;
D = diff(B,1,2);

i=2;
j=1;

while i <= size(B,2)+1-t_step
    RV_B(j)=    sum(mean(D(:,1:i-1).^2,1));   
    i=i+t_step;
    j=j+1;
end

RV_B(j)=    sum(mean(D(:,1:end).^2,1));
M_1=RV_B';

RV_B=ones(size(B,2),1);

t_step = 1;
D = diff(B,1,2);
MD = mean(D.^2);

i=2;
j=1;

while i <= size(B,2)+1-t_step
    RV_B(j)=    sum(MD(:, 1:i-1));   
    i=i+t_step;
    j=j+1;
end

RV_B(j)=    sum(MD(:, 1:end));
M_1 = RV_B';

diffB = diff(B,1,2);
meanSquaredDiff = mean(diffB.^2);
M_1 = cumsum(meanSquaredDiff, 2);

Context

StackExchange Code Review Q#84364, answer score: 4

Revisions (0)

No revisions yet.