snippetMinor

How to avoid column wise access in matrix multiplication?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

columnmultiplicationwiseavoidhowmatrixaccess

Problem

I know when we access elements in rows it will be much faster than if it is accessed column wise. In matrix multiplication one of the matrices must be accessed column wise.
In GPUs with CUDA/OpenCL I could resolve this issue by explicitly copying both matrices row-wise into user-managed shared(cache)-memory (or local memory in OpenCL), and then can access data for matrix multiplication in anyway- no penalty for column access.

My question is how do I avoid such column access if I am implementing Matrix-multiplication on CPU, where I do not have any access to such shared memory as in GPU?

Solution

I'm assuming you don't want to store a copy of the second matrix in column-wise order and then multiply. In this case maybe the best way is to take the most advantage of cache hierarchy in modern CPUs. There are many complexity results for such setting.

For matrix multiplication there is a cache-oblivious algorithm based on Strassen's algorithm. Here you can find it in details. Notice that their aim is to optimize I/O access, so it may not fully match your requirement.

Context

StackExchange Computer Science Q#18072, answer score: 2

Revisions (0)

No revisions yet.