patternMinor

Are there parallel matrix exponentiation algorithms that are more efficient than sequential multiplication?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

numerical-algorithms cs algorithms stackoverflow parallel-computing matrices efficiency

multiplicationarethanmoreefficientalgorithmssequentialparallelthatthere

Problem

One is required to find power (positive integer) of matrix of real numbers. There are lots of efficient matrix multiplication algorithms (e.g. some parallel algorithms are Cannon's, DNS) but are there algorithms that are intended exactly for finding power of matrix and that are more efficient than sequential execution of matrix multiplication? I am particularly interested in parallel algorithms.

Solution

If you have multiple processors that can work in parallel, then you can calculate any power up to the power (2^k) in k steps. For example: To calculate $M^{15}$, you calculate:

Stage 1: Calculate $M^2$

Stage 2: Calculate $M^3 = M^2 M$ and $M^4 = M^2 M^2$

Stage 3: Calculate $M^7 = M^4 M^3$ and $M^8 = M^4 M^4$

Stage 4: Calculate $M^{15} = M^8 * M^7$

This is one multiplication more than calculating $M^5$ in three multiplications and raising $M^5$ to the third power in another two multiplications, but should be faster if you have two processors. For arbitrary high powers, you will need more processors.

If you use a brute force algorithm for the multiplication, multiplying row by column, you may save some time by calculating one row of a product, then immediately using that row for the next product. This would help in the calculation of $M^3$ where we can start calculating $M^3$ as soon as the first row of $M^2$ has been calculated; it wouldn't be that helpful with $M^4$ since we need both rows and columns of $M^2$. For large powers, you could probably arrange which powers to calculate.

And after posting this it becomes obvious that you can use multiple processors very easily: You start by calculating the first row of $M^2 = M M$. When you have that row, you have all the information you need to calculate the first row of $M^3 = M^2 M$, so you calculate the second row of $M^2$ and the first row of $M^3$ in parallel. Then you can calculate the third row of $M^2$, the second row of $M^3$, and the first row of $M^4$ in parallel and so on.

This will do a lot more operations than necessary (for example, 14 matrix multiplications for $M^{15}$ instead of the minimal 5 or the 6 of the four-stage method). If the power is not large compared to the number of processors this will still be faster. But calculating $M^{1000}$ with four processors using this method will be inefficient; doing this in an optimal way would be an interesting problem.

Combining approaches: Using four processors for example, you can calculate AB, ABC, ABCD and ABCDE almost in parallel by calculating each product one row at a time. This allows calculate all four of $M^2$ to $M^5$ using four processors in about the same time as one product with one processor.

Given these four results and the original M, you can calculate four of the matrices $M^6$ to $M^{25}$ in the same time again, provided the matrices are at most five powers apart from each other. So each power up to $M^{25}$ can be calculated in about twice the time of a single processor matrix product.

With these matrices calculated, all matrices up to $M^{108}$ and some more up to $M^{125}$ can be calculated in three times the time of a single matrix product if four processors are available. With k processors this should go up to at least the power $k (k+1)^2$.

Context

StackExchange Computer Science Q#62655, answer score: 5

Revisions (0)

No revisions yet.