patternMinor

Automated optimization of 0-1 matrix vector multiplication

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

multiplicationautomatedoptimizationmatrixvector

Problem

Question:

Is there established procedure or theory for generating code that efficiently applies a matrix-vector multiplication, when the matrix is dense and filled with only zeros and ones? Ideally, the optimized code would make systematic use of previously computed information to reduce duplicated work.

In other words, I have a matrix $M$ and I want to do some precomputation based upon $M$, that will make computing $Mv$ as efficient as possible when I later receive the vector $v$.

$M$ is a rectangular dense binary matrix that is known at "compile time", whereas $v$ is an unknown real vector that is only known at "run time".

Example 1: (sliding window)

Let me use an easy small example to illustrate my point. Consider the matrix,
$$M = \begin{bmatrix}1 & 1 & 1 & 1 & 1\\ & 1 & 1 & 1 & 1 & 1 \\ & & 1 & 1 & 1 & 1 & 1\\
& & & 1 & 1 & 1 & 1 & 1\end{bmatrix}.$$
Supposing we apply this matrix to a vector $v$ to get $w = Mv$. Then the entries of the result are,
\begin{align}
w_1 &= v_1 + v_2 + v_3 + v_4 + v_5\\
w_2 &= v_2 + v_3 + v_4 + v_5 + v_6\\
w_3 &= v_3 + v_4 + v_5 + v_6 + v_7\\
w_4 &= v_4 + v_5 + v_6 + v_7 + v_8
\end{align}

Doing a standard matrix-vector multiplication will compute exactly this way. However, a lot of this work is redundant. We could do the same matrix computation at less cost by keeping track of a "running total", and adding/subtracting to get the next number:
\begin{align}
w_1 &= v_1 + v_2 + v_3 + v_4 + v_5\\
w_2 &= w_1 + v_6 - v_1\\
w_3 &= w_2 + v_7 - v_2\\
w_4 &= w_3 + v_8 - v_3
\end{align}

Example 2: (hierarchical structure)

In the previous example, we could just keep track of a running total. However, usually one would need to create and store a tree of intermediate results. For example, consider
$$M = \begin{bmatrix}1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 \\ & & & & 1 & 1 & 1 & 1\\ 1 & 1 \\ & & & & 1 & 1 \\ & & 1 & 1 \\ & & & & & & 1 & 1\end{bmatrix}$$
One could compute $w = Mv$ efficiently using a tree of intermediate results:

-

Solution

This is related to an open research question, which is known as the "Online Boolean Matrix-Vector Multiplication (OMv) problem". This problem reads as follows (see [1]): Given a binary $n \times n$ matrix $M$ and $n$ binary column vectors $v_1, \dots, v_n$, we need to compute $M v_i$ before $v_{i+1}$ arrives.

Notice that the problem from the question is somewhat more general: It allows for $m \times n$ matrices and real-valued vectors. Observe that the problem with $n \times n$ matrices and Boolean vectors is "easier", as it presents a special case.

Clearly, the naïve algorithm for the Online Boolean Matrix-Vector Multiplication problem (which just uses standard matrix-vector-multipliction) takes time $O(n^3)$. There is a conjecture (see e.g. [1]) that this cannot be done truly faster than $O(n^3)$. (In more detail, this conjecture goes as follows: There exists no truly subcubic algorithm, which solves the Online Boolean Matrix-Vector Multiplication Problem, i.e. there is no algorithm with running time $O(n^{3 - \varepsilon})$ for $\varepsilon > 0$).

It is known that Williams's algorithm solves this problem in time $O(n^3 / \log^2 n)$. See [2] for more details.

It would be a breakthrough in the area of conditional lower bounds, if one could prove or disprove the above conjecture.

[1] Unifying and Strengthening Hardness for Dynamic Problems via an Online Matrix-Vector Multiplication Conjecture. by Henzinger, Krinninger, Nanongkai and Saranurak

[ http://eprints.cs.univie.ac.at/4351/1/OMv_conjecture.pdf ]

[2] Matrix-vector multiplication in sub-quadratic time: (some preprocessing required). by Williams

[ http://dl.acm.org/citation.cfm?id=1283383.1283490 ]

Update

One of the questions in the comments was as follows: We know $M$ at compile time. Can't we adjust our algorithm to suit $M$, so the OMv problem (conjecture) does not apply? We will see that this is not the case, unless the OMv conjecture fails.

The proof idea is simple: Assume we could give fast algorithms for all matrices up to some certain size (e.g. distinguishing all possible cases). After this certain size we use divide and conquer.

Here are the details:

Fix some $n_0 \in \mathbb{N}$, which (without loss of generality) is a power of 2 and bigger than 2. Now assume that for all $n \leq n_0$ and all $n \times n$ matrices $M$ we know an algorithm $A_{n,M}$, that for all vectors $v$ computes $Mv$ in truly subquadratic time, i.e. in time $O(n^{2 - \varepsilon})$ for some $\varepsilon > 0$. (Notice that this allows an individual algorithm for each matrix up to size $n_0 \times n_0$.)

Now we will solve OMv in truly subcubic time:

Given a binary matrix $M$ of size $n \times n$, where $n = 2^k$ for some $k$ and $n > n_0$, we use a divide and conquer strategy. We divide $M$ into four submatrices $M_1, M_2, M_3, M_4$ of sizes $2^{k-1} \times 2^{k-1}$. If $2^{k-1} \leq n_0$, then we use algorithm $A_{2^{k-1},M_i}$, otherwise, we recurse. (As $n_0$ is some fixed number, we can pick the correct algorithm in constant time.)

Notice that we will need at most $O(\log n)$ recursion steps. Also, for $n$ vectors $v_1, \dots, v_n$, we will $n$ computations. Thus, to process all matrix-vector multiplications we will need a total computation time of $O(n^{3 - \varepsilon} \log n)$.

It is well known that the logarithm grows slower than any polynomial (in particular slower than any root). Fixing some $\tilde \varepsilon > 0$ with $\tilde \varepsilon < \varepsilon$, we see that our total computation is running in truly subcubic time (in particular, in time $O(n^{3 - \tilde \varepsilon})$). Thus, the OMv conjecture would be wrong.

(If $M$ has size $m \times n$ and $m$ and $n$ are not powers of 2, then the bounds on the running times still apply, as we could just increase $n$ and $m$ to the next powers of 2.)

Conclusion: If you could make use of case distinctions on the input matrices to derive fast algorithms, then you could improve the OMv conjecture.

Context

StackExchange Computer Science Q#45596, answer score: 8

Revisions (0)

No revisions yet.