patterncMinor

Sparse matrix multiplication for billion by billion matrices

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

codereview c openmp stackoverflow performance matrix

billionmultiplicationformatricesmatrixsparse

Problem

I have to do multiplication of two billion by billion sparse matrices (on a CPU), hence any help or hints in optimizing the below given code would be extremely useful.

Note: I am only showing the code to compute number of non-zeros in the output matrix. Buffer based adjustment of memory is not an option for such a large matrix. Matrix multiplication will be performed in parts. The line with the problem is workarray[matrixb->colInd[k]] = 1 as it is repeating some calculations but I have not been able to come up with a method to reduce the computations.

```
#include
#include
#include
#include
#include
struct sparsemat
{
int nzmax;
int rows;
int cols;
int *rowPtr;
int *colInd;
double *values;
};
struct darray
{
double * array;
int rows;
int cols;
};
struct sparsemat *create(int rows, int cols, int nzmax)
{
struct sparsemat *matrix;
matrix = (struct sparsemat*) calloc(1, sizeof (struct sparsemat));
matrix->rows = rows;
matrix->nzmax = nzmax;
matrix->cols = cols;
matrix->colInd = (int*) calloc(nzmax, sizeof (int));
matrix->rowPtr = (int*) calloc(rows + 1, sizeof (int));
matrix->values = (double*) calloc(nzmax, sizeof (double));
return matrix;
}
void _destroy2(struct sparsemat *matrix)
{
if (matrix != 0)
{
if (matrix->rowPtr != 0)
{
free(matrix->rowPtr);
matrix->rowPtr = NULL;
}
if (matrix->colInd != 0)
{
free(matrix->colInd);
matrix->colInd = NULL;
}
if (matrix->values != 0)
{
free(matrix->values);
matrix->values = NULL;
}
matrix->nzmax = 0; matrix->rows = 0; matrix->cols = 0; free(matrix); matrix = NULL;
}
}
void sparse_dual2(const struct sparsemat const matrixa, const struct sparsemat const matrixb, struct sparsemat * matrixc)
/ loop counters and scratch variables/
{
int i, j, k, index;
double scalar;
/* Worka

Solution

A few things immediately stand out to me:

You aren't handling allocation failures. In create() the initial creation of the struct could fail. So could allocating any of the contained pointers. You need to deal with that case. The easiest thing is probably to check for failure of each allocation, and if any of them fail, free the already allocated ones and return NULL. This also applies to the allocation of workarray and matrixc.

Your naming is not very good. A name like darray may tell you it's an array of doubles, but it doesn't give you any clue as to what it's used for. (But apparently, it's not used - so why is it there?) Likewise, why call the destruction method _destroy2()? Was there a _destroy1()? If so, what's the difference between that and this? If not, why does this one end in 2? Same with sparse_dual2(). And the matrixc argument to sparse_dual2() should indicate that it's the return value. Maybe name it result or dst? (Some standard library calls, like memcpy() take the destination as the first argument, so it could be confusing.)

Instead of using const pointers to const data, would it make sense to use const references? Something like const struct sparsemat& foo? (Or would it need to be const struct sparsemat& const foo?) It seems like that could be simplified, though, I admit, I could be misunderstanding it.

I find your loop end condition on the 2 inner loops confusing. I assume it's saying, "The item that's 1 before the first item of the next row", so basically, the last item of this row? It would be clearer if you made a variable with the appropriate name and put that value into it. Or at least, change it to rowPtr[index] instead of `

-
In terms of optimizing the particular line of interest, have you tried using pointers instead of array indices?

Something like this:

int* nextCol = matrixb->colInd[matrixb->rowPtr[matrixa->colInd[j]]];
for (k = matrixb->rowPtr[matrixa->colInd[j]]; k rowPtr[matrixa->colInd[j] + 1] - 1; ++k, ++nextCol)
workarray[*nextCol] = 1;

You might even be able to keep a pointer to the next place in the work array, so something like this:

int* nextCol = &matrixb->colInd[matrixb->rowPtr[matrixa->colInd[j]]];
unsigned char* nextWorkCell = &workarray[*nextCol];
for (k = matrixb->rowPtr[matrixa->colInd[j]]; k rowPtr[matrixa->colInd[j] + 1] - 1; ++k, ++nextCol)
{
    nextWorkCell += nextCol;
    *nextWorkCell = 1;
}

I'm not sure if that would be any better than your compiler's optimizer.

Code Snippets

int* nextCol = matrixb->colInd[matrixb->rowPtr[matrixa->colInd[j]]];
for (k = matrixb->rowPtr[matrixa->colInd[j]]; k <= matrixb->rowPtr[matrixa->colInd[j] + 1] - 1; ++k, ++nextCol)
workarray[*nextCol] = 1;

int* nextCol = &matrixb->colInd[matrixb->rowPtr[matrixa->colInd[j]]];
unsigned char* nextWorkCell = &workarray[*nextCol];
for (k = matrixb->rowPtr[matrixa->colInd[j]]; k <= matrixb->rowPtr[matrixa->colInd[j] + 1] - 1; ++k, ++nextCol)
{
    nextWorkCell += nextCol;
    *nextWorkCell = 1;
}

Context

StackExchange Code Review Q#75299, answer score: 2

Revisions (0)

No revisions yet.