patterncppMinor
Multiplying two- or three-dimensional arrays with broadcasting
Viewed 0 times
threearrayswithmultiplyingbroadcastingtwodimensional
Problem
Background
Disclaimer: Skip if you are not interested in the background. The code is far far simpler than this.
The problem is more about programming than math. Here is the definition of multiplication with broadcasting in human readable language.
I am trying to optimize the multiplication of large n-dimensional arrays \$A\$ and \$B\$ with shape \$(I_0,I_1,I_2, ..., I_{n-1})\$ and \$(J_0,J_1,J_2,..., J_{n-1})\$ with broadcasting, as in matlab and numpy. \$A\$ and \$B\$ are stored in row-major order in memory.
The two arrays can be multiplied with broadcasting if for each \$i\$, at least one of the following three statements is true:
Case S1: \$I_i == J_i\$, Case S2: \$I_{i} == 1\$, Case S3: \$J_{i} == 1\$.
From the shape, we can define stride. Stride is the numbers of linear index required to step through to take one step along each dimension of the n-dimensional array.
Linear index is the index of the 1D array in memory that stores the n-dimensional array.
The stride for \$A\$, which is \$(P_1, P_2, P_3,...,P_n)\$ has the following formula: \$P_i = I_{i+1}I_{i+2}I_{i+3}...I_{n}\$ except \$P_{n} = 1\$. Similarly strides for \$B\$ is \$(Q_1, Q_2, Q_3, ... Q_n)\$.
The general form of the nested loop is:
```
int cal_size(int* shape, int n){
int size = 1;
for(int i = 0; i < n; ++i) size*=shape[i];
}
int cal_stride(int shape, int n){
int size = cal_size(shape, n);
int* stride = new int[n];
stride[0] = size/shape[0];
for(int i = 0; i < n; ++i){
stride[i] = stride[i-1]/shape[i];
}
}
int n;
//the number of dimensions (given)
int I[] = {I0,I1,I2,...,I_last};
//shape of n-dimensional array A (given)
int J[] = {J0,J1,J2,...,J_last};
//shape of n-dimensional array B (given)
int A[cal_size(I, n)];
//1D array in memory that stores
//the n-dimensional array A in row major order
int B[cal_size(J, n)];
//1D array in memory that stores
//the n-dimensional array B in row major order
int* P = cal_strid
Disclaimer: Skip if you are not interested in the background. The code is far far simpler than this.
The problem is more about programming than math. Here is the definition of multiplication with broadcasting in human readable language.
I am trying to optimize the multiplication of large n-dimensional arrays \$A\$ and \$B\$ with shape \$(I_0,I_1,I_2, ..., I_{n-1})\$ and \$(J_0,J_1,J_2,..., J_{n-1})\$ with broadcasting, as in matlab and numpy. \$A\$ and \$B\$ are stored in row-major order in memory.
The two arrays can be multiplied with broadcasting if for each \$i\$, at least one of the following three statements is true:
Case S1: \$I_i == J_i\$, Case S2: \$I_{i} == 1\$, Case S3: \$J_{i} == 1\$.
From the shape, we can define stride. Stride is the numbers of linear index required to step through to take one step along each dimension of the n-dimensional array.
Linear index is the index of the 1D array in memory that stores the n-dimensional array.
The stride for \$A\$, which is \$(P_1, P_2, P_3,...,P_n)\$ has the following formula: \$P_i = I_{i+1}I_{i+2}I_{i+3}...I_{n}\$ except \$P_{n} = 1\$. Similarly strides for \$B\$ is \$(Q_1, Q_2, Q_3, ... Q_n)\$.
The general form of the nested loop is:
```
int cal_size(int* shape, int n){
int size = 1;
for(int i = 0; i < n; ++i) size*=shape[i];
}
int cal_stride(int shape, int n){
int size = cal_size(shape, n);
int* stride = new int[n];
stride[0] = size/shape[0];
for(int i = 0; i < n; ++i){
stride[i] = stride[i-1]/shape[i];
}
}
int n;
//the number of dimensions (given)
int I[] = {I0,I1,I2,...,I_last};
//shape of n-dimensional array A (given)
int J[] = {J0,J1,J2,...,J_last};
//shape of n-dimensional array B (given)
int A[cal_size(I, n)];
//1D array in memory that stores
//the n-dimensional array A in row major order
int B[cal_size(J, n)];
//1D array in memory that stores
//the n-dimensional array B in row major order
int* P = cal_strid
Solution
Bug
Version 3 does half as many multiplications than the other versions. The problem is in the loops:
As you can see, you have both
Version 3 does half as many multiplications than the other versions. The problem is in the loops:
for(int iA = 0; iA < 2*3*4; iA+=12){
for(int iB = 0; iB < 3*4; ++iA, ++iB){As you can see, you have both
iA+=12 in the outer loop and ++iA in the inner loop. For this version, if you are incrementing iA in the inner loop, you shouldn't also increment it in the outer loop. The correct way would be:for(int iA = 0; iA < 2*3*4; ){
for(int iB = 0; iB < 3*4; ++iA, ++iB){Code Snippets
for(int iA = 0; iA < 2*3*4; iA+=12){
for(int iB = 0; iB < 3*4; ++iA, ++iB){for(int iA = 0; iA < 2*3*4; ){
for(int iB = 0; iB < 3*4; ++iA, ++iB){Context
StackExchange Code Review Q#142269, answer score: 2
Revisions (0)
No revisions yet.