patternMinor
What constant in the latest fast matrix multiplication is hidden by Big-O notation?
Viewed 0 times
fastthelatestconstantwhatmultiplicationhiddenbignotationmatrix
Problem
I'm evaluating the theoretical run time of matrix multiplication algorithms as it has improved within the last few decades. Algorithms to solve matrix multiplication run in O(n^w) time, where w has evolved in the following way:
I often hear that the more recent algorithms presented, while theoretically interesting since w approaches 2, are not practical due to the large constant hidden by Big-O. In other words, that the size of the input matrix would need to be extremely large in order for it to be an improvement over Strassen's algorithm, for example. I would like to put a number to this.
What is the rough value of this hidden constant for any of the more recent fast matrix multiplication algorithms? Or, for roughly what input size n of a matrix would the fast matrix multiplication algorithm use less operations than Strassen's algorithm, or the brute force approach?
- w = 3 if using a brute-force approach,
- w = 2.8074 if using Volker Strassen's algorithm [1969],
- ... (some interesting improvements later :))
- w = 2.3729 if using Virginia Vassilevska-Williams analysis of the Coppersmith–Winograd algorithm [2011]
- w = 2.3728639 if using François Le Gall's adaptation of the Coppersmith–Winograd algorithm [2014]
I often hear that the more recent algorithms presented, while theoretically interesting since w approaches 2, are not practical due to the large constant hidden by Big-O. In other words, that the size of the input matrix would need to be extremely large in order for it to be an improvement over Strassen's algorithm, for example. I would like to put a number to this.
What is the rough value of this hidden constant for any of the more recent fast matrix multiplication algorithms? Or, for roughly what input size n of a matrix would the fast matrix multiplication algorithm use less operations than Strassen's algorithm, or the brute force approach?
Solution
All improvements since Strassen's original algorithm only achieve their exponent in the limit. That is, if they are stated as $O(n^c)$ algorithms, what they really show is that for every $\epsilon>0$ you can multiply two $n\times n$ matrices using at most $C_\epsilon n^{c-\epsilon}$ multiplications, where the constant $C_\epsilon$ blows up badly (exponentially in $1/\epsilon$?) as $\epsilon\to0$. (Coppersmith and Winograd showed that this kind of behavior is actually necessary.)
The papers don't explicitly compute $C_\epsilon$. One reason is that if you are serious about the constant, then you will probably need to apply low-level optimizations that will improve $C_\epsilon$ at the expense of the simplicity of the algorithm. Another reason is that due to the exponential dependence on $1/\epsilon$, you don't expect the result to be practically meaningful, so it's not really worth it. A third reason is that fast matrix multiplication algorithms are developed in a more abstract framework, which makes it awkward to come up with an actual explicit algorithm.
Fast matrix multiplication algorithms are recursive, that is, they iterate a smart $m\times m$ multiplication algorithm. Whereas for Strassen's original algorithm $m = 2$, for the improvements $m$ depends on $\epsilon$ (exponentially?), and so you will only be able to apply the base case for very large matrices; and for such large matrices, you would actually do much worse than the high school algorithm. The benefits of these algorithms only show up once they have been iterated many times, which would correspond to astronomically high (or worse) dimensions.
If you do feel like working out a constant, there are several expositions of the theory of fast matrix multiplication. You could take the first improvement over Strassen's original algorithm, and try to work out what the parameters are like.
The papers don't explicitly compute $C_\epsilon$. One reason is that if you are serious about the constant, then you will probably need to apply low-level optimizations that will improve $C_\epsilon$ at the expense of the simplicity of the algorithm. Another reason is that due to the exponential dependence on $1/\epsilon$, you don't expect the result to be practically meaningful, so it's not really worth it. A third reason is that fast matrix multiplication algorithms are developed in a more abstract framework, which makes it awkward to come up with an actual explicit algorithm.
Fast matrix multiplication algorithms are recursive, that is, they iterate a smart $m\times m$ multiplication algorithm. Whereas for Strassen's original algorithm $m = 2$, for the improvements $m$ depends on $\epsilon$ (exponentially?), and so you will only be able to apply the base case for very large matrices; and for such large matrices, you would actually do much worse than the high school algorithm. The benefits of these algorithms only show up once they have been iterated many times, which would correspond to astronomically high (or worse) dimensions.
If you do feel like working out a constant, there are several expositions of the theory of fast matrix multiplication. You could take the first improvement over Strassen's original algorithm, and try to work out what the parameters are like.
Context
StackExchange Computer Science Q#115154, answer score: 3
Revisions (0)
No revisions yet.