patterncMinor
Mean of the squares of differences for a large matrix
Viewed 0 times
themeansquaresdifferenceslargeformatrix
Problem
I have tried to obtain speed gains by transferring Matlab calculations to C and call it using .mex. My goal is to operate on a matrix \$B\$ which has roughly the dimension 10000x1000
$$
\frac{1}{\text{size(B,1)}}\sum_{i=1}^{2+l*t} \text{mean}( (B(:,i)-B(:,i+1))^2)$$
for \$ l*t
#include "mex.h"
void mexFunction(int nlhs, double plhs[], int nrhs, const mxArray prhs[])
{
// calculate 1/size(A,1)sum_{i=1}^{2+lt} mean(A(:,i)-A(:,i+1))^2, for l*t
Now when I translate the C code to a .mex file and compare the speed of the Matlab and the C code the C code is roughly 4-8 times slower for big \$ B\$'s. Why is this the case? Where is the bottleneck in my C-code and how can I work around it?
$$
\frac{1}{\text{size(B,1)}}\sum_{i=1}^{2+l*t} \text{mean}( (B(:,i)-B(:,i+1))^2)$$
for \$ l*t
A=randn(10000,1000);
tic
RVV_Da_path=ones(size(A,2)-1,1);
i=2;
j=1;
while i
My C code goes as follows:
#include #include "mex.h"
void mexFunction(int nlhs, double plhs[], int nrhs, const mxArray prhs[])
{
// calculate 1/size(A,1)sum_{i=1}^{2+lt} mean(A(:,i)-A(:,i+1))^2, for l*t
Now when I translate the C code to a .mex file and compare the speed of the Matlab and the C code the C code is roughly 4-8 times slower for big \$ B\$'s. Why is this the case? Where is the bottleneck in my C-code and how can I work around it?
Solution
The overall Matlab algorithm can be simplified. The key is to realize that:
First let us remove the variable step factor. So
Let
Let
You'll notice that if
So the original code you posted:
Can be simplified to:
The next step is to realize that if
So now the code could look like this:
Similarly, instead of performing a
&= CSMD[i][j-1] + MD[i][j]\end{align}\\$$
Incorporating Matlab's built-in
Now we no longer have any explicit loops (in the matlab code at least). This algorithm is much faster and simpler. Consequently, it is much easier to port to C.
You will find that for
Now let us reincorporate the variable step factor. Realize that if we increase
Finally the last index of the non-padded output will differ as well (if \$\textrm{tstep} \gt 1\$) since your computation computes the last index differently than the rest of the indices. You can fix this by performing
- Inter-column differences never change
- You can use cumulative sums on the means of each column
First let us remove the variable step factor. So
tstep = 1. I will reincorporate the variable step factor later.Let
B be a \$m \times n\$ matrix. B(:,1:i) yields a \$m \times i\$ submatrix of B.Let
D = diff(B(:,1:i),1,2). This yields the column differences of B(:,1:i).- The first column of
D, akaD(:,1), is equal to the second column ofBminus the first column ofB, akaB(:,2) - B(:,1)
- The second column of
D, akaD(:,2)is equal to the third column ofBminus the second column ofB, akaB(:,3) - B(:,2)
- ...
- The last column of
D, akaD(:,i-1)is equal to the \$i^{th}\$ column ofBminus the \$(i-1)^{th}\$ column ofB, akaB(:,i) - B(:,i-1)
You'll notice that if
i increases by 1, these column differences remain the same. The only thing that changes is that we obtain one additional column difference in D. Hence you do not need to perform the diff() function multiple times. Simply perform it once on the entire \$m \times n\$ matrix to yield a \$m \times (n-1)\$ matrix of column differences and take submatrices of this difference matrix.So the original code you posted:
B=(20*randn(10000,1000));
RV_B=ones(size(B,2),1);
t_step = 1;
i=2;
j=1;
while i <= size(B,2)+1-t_step
RV_B(j)= sum(mean(diff((B(:,1:i)),1,2).^2,1));
i=i+t_step;
j=j+1;
end
RV_B(j)= sum(mean(diff((B(:,1:end)),1,2).^2,1));
M_1=RV_B';Can be simplified to:
RV_B=ones(size(B,2),1);
t_step = 1;
D = diff(B,1,2);
i=2;
j=1;
while i <= size(B,2)+1-t_step
RV_B(j)= sum(mean(D(:,1:i-1).^2,1));
i=i+t_step;
j=j+1;
end
RV_B(j)= sum(mean(D(:,1:end).^2,1));
M_1=RV_B';The next step is to realize that if
D is our \$m \times (n-1)\$ difference matrix, mean(D) yields a \$1 \times (n-1)\$ matrix of column means. If we truncate columns of this matrix as we do when we take submatrix D(:,1:i-1) it does not affect the means of the columns that are still there. So instead of performing mean() over and over on submatrices, we can again pull it out the loop, perform mean() once on the entire matrix D, and take submatrices of this mean matrix.So now the code could look like this:
RV_B=ones(size(B,2),1);
t_step = 1;
D = diff(B,1,2);
MD = mean(D.^2);
i=2;
j=1;
while i <= size(B,2)+1-t_step
RV_B(j)= sum(MD(:, 1:i-1));
i=i+t_step;
j=j+1;
end
RV_B(j)= sum(MD(:, 1:end));
M_1 = RV_B';Similarly, instead of performing a
sum on the rows of the submatrices, we can perform a cumulative sum along the rows (or rather row): $$\begin{align}CSMD[i][j] &= \sum_{k=1}^{j} MD[i][k] \\&= CSMD[i][j-1] + MD[i][j]\end{align}\\$$
Incorporating Matlab's built-in
cumsum() along with changing some intermediary variable names to better illustrate their purpose yields the following code:diffB = diff(B,1,2);
meanSquaredDiff = mean(diffB.^2);
M_1 = cumsum(meanSquaredDiff, 2);Now we no longer have any explicit loops (in the matlab code at least). This algorithm is much faster and simpler. Consequently, it is much easier to port to C.
You will find that for
tstep=1, M_1 for your algorithm has dimension size(B,2) and for the simplified code I provided it has dimension size(B,2)-1. However, in your original algorithm, M_1(end) and M_1(end-1) are always the same if tstep=1, hence you were performing one calculation too many.Now let us reincorporate the variable step factor. Realize that if we increase
tstep it is equivalent to taking every \$\textrm{tstep}^{th}\$ output after the cumulative sum. So if you want tstep=5 you perform the same three-line code above but at the end sample it via M_1 = M_1(1:5:end) or in general M_1 = M_1(1:step:end). You can use isequal to compare the outputs of your original approach to this approach. Remember that you one-padded your output so if you use \$\textrm{tstep} \gt 1\$, you need to remove the extra padding when comparing the matrices.Finally the last index of the non-padded output will differ as well (if \$\textrm{tstep} \gt 1\$) since your computation computes the last index differently than the rest of the indices. You can fix this by performing
tmp = M_1(end); M_1 = M_1(1:step:end); M_1(end) = tmp; instead of simply M_1 = M_1(1:step:end) however the latter approach is more consistent in my opinion.Code Snippets
B=(20*randn(10000,1000));
RV_B=ones(size(B,2),1);
t_step = 1;
i=2;
j=1;
while i <= size(B,2)+1-t_step
RV_B(j)= sum(mean(diff((B(:,1:i)),1,2).^2,1));
i=i+t_step;
j=j+1;
end
RV_B(j)= sum(mean(diff((B(:,1:end)),1,2).^2,1));
M_1=RV_B';RV_B=ones(size(B,2),1);
t_step = 1;
D = diff(B,1,2);
i=2;
j=1;
while i <= size(B,2)+1-t_step
RV_B(j)= sum(mean(D(:,1:i-1).^2,1));
i=i+t_step;
j=j+1;
end
RV_B(j)= sum(mean(D(:,1:end).^2,1));
M_1=RV_B';RV_B=ones(size(B,2),1);
t_step = 1;
D = diff(B,1,2);
MD = mean(D.^2);
i=2;
j=1;
while i <= size(B,2)+1-t_step
RV_B(j)= sum(MD(:, 1:i-1));
i=i+t_step;
j=j+1;
end
RV_B(j)= sum(MD(:, 1:end));
M_1 = RV_B';diffB = diff(B,1,2);
meanSquaredDiff = mean(diffB.^2);
M_1 = cumsum(meanSquaredDiff, 2);Context
StackExchange Code Review Q#84364, answer score: 4
Revisions (0)
No revisions yet.