snippetMinor
How to compute $\mathbf{X}^T \mathbf{X}$ efficiently for large $\mathbf{X}$?
Viewed 0 times
efficientlymathbfcomputelargeforhow
Problem
Let $\mathbf{X}$ be a $n \times n$ matrix. Given that we can only keep $k$ rows ($k << n$) or columns of the matrix in memory, how can we compute $\mathbf{X}^T \mathbf{X}$ while minimizing the number of disk accesses?
Are there known algorithms for this problem? I searched for external matrix multiplication etc. but couldn't find much.
Are there known algorithms for this problem? I searched for external matrix multiplication etc. but couldn't find much.
Solution
"Blocked matrix multiplication" is one way to optimize matrix multiplication for memory access.
From "Using Blocking to Increase Temporal Locality" by Bryant and O’Hallaron (2012):
Blocking a matrix multiply routine works by partitioning the matrices
into submatrices and then exploiting the mathematical fact that these
submatrices can be manipulated just like scalars.
"The cache performance and optimizations of blocked algorithms" by Lam, Rothbrg, and Wolf (1991):
...presents cache performance data for blocked programs and evaluates
several optimizations to improve this performance.
More lecture notes:
Feb 23, 2016)
Memory Hierarchies and Optimizing Matrix Multiplication by James
Demmel (accessed Feb 23, 2016)
From "Using Blocking to Increase Temporal Locality" by Bryant and O’Hallaron (2012):
Blocking a matrix multiply routine works by partitioning the matrices
into submatrices and then exploiting the mathematical fact that these
submatrices can be manipulated just like scalars.
"The cache performance and optimizations of blocked algorithms" by Lam, Rothbrg, and Wolf (1991):
...presents cache performance data for blocked programs and evaluates
several optimizations to improve this performance.
More lecture notes:
- Matrix Computations (CS 6210): Week 1 by David Bindel (accessed
Feb 23, 2016)
- CS 267 Applications of Parallel Computers Lecture 2:
Memory Hierarchies and Optimizing Matrix Multiplication by James
Demmel (accessed Feb 23, 2016)
Context
StackExchange Computer Science Q#53441, answer score: 3
Revisions (0)
No revisions yet.