patternMinor
Generalizing Matrix Chain problem: optimal summation in a tree
Viewed 0 times
problemgeneralizingoptimalsummationmatrixtreechain
Problem
Matrix Chain Problem can be viewed as the problem of finding the optimal summation order in a path-structured tensor network. How hard is the problem if we extend it to trees?
For instance, take the following sum, written in Einstein summation notation
$$A_i B^i_j C^j_k D^k$$
This corresponds to the tensor network below:
There are 3 edges $e$, corresponding to indices $i,j,k$, each with their own dimension $d_e$. One possible summation order is below:
$$((A_i B_j^i) C_k^j) D^k$$
The number of multiplications, aka, total cost of this order is
$$d_i\times d_j + d_j\times d_k + d_k$$
Given a list of dimensions $\{d_e\}$, finding the order which minimizes total cost is equivalent to the Matrix Chain problem, solvable in $n \log n$ time (wikipedia).
Take the following sum in Einstein summation notation:
$$A_{ij}B^{i}_{kl}C^j_{mn}D^k E^l F^m G^n$$
This corresponds to the tensor network below, binary tree of depth $D=3$:
There are 6 edges $e$, each corresponding to an index of dimension $d_e$. One way of computing the sum is the breadth-first search order
$$(((((A_{ij}B^{i}_{kl})C^j_{mn})D^k) E^l) F^m) G^n$$
Each step is a contraction of two tensors, requiring the number of multiplications which is the product of dimensions of all indices occurring in either tensor. IE, cost of computing $Y_i^l=X_{ijk}Y^{jkl}$ is $d_i\times d_j\times d_k\times d_l$.
Total cost of computing this sum in breadth-first search order is
$$d_j\times d_i\times d_k \times d_l +d_k\times d_l\times d_j\times d_m\times d_n+d_k\times d_l \times d_m \times d_n + d_l \times d_m \times d_n +d_m\times d_n + d_n$$
For a tensor network with $n$ tensors, forming a complete binary tree of depth $D$, given a list of dimensions $\{d_e\}$, what is the complexity of finding an order which minimizes total cost?
PS: if we allow leaves to have free indices (ie, dangling edges in this graph), this kind of representation is an instance of "Hierarchical Tucker Decomposition"
For instance, take the following sum, written in Einstein summation notation
$$A_i B^i_j C^j_k D^k$$
This corresponds to the tensor network below:
There are 3 edges $e$, corresponding to indices $i,j,k$, each with their own dimension $d_e$. One possible summation order is below:
$$((A_i B_j^i) C_k^j) D^k$$
The number of multiplications, aka, total cost of this order is
$$d_i\times d_j + d_j\times d_k + d_k$$
Given a list of dimensions $\{d_e\}$, finding the order which minimizes total cost is equivalent to the Matrix Chain problem, solvable in $n \log n$ time (wikipedia).
Take the following sum in Einstein summation notation:
$$A_{ij}B^{i}_{kl}C^j_{mn}D^k E^l F^m G^n$$
This corresponds to the tensor network below, binary tree of depth $D=3$:
There are 6 edges $e$, each corresponding to an index of dimension $d_e$. One way of computing the sum is the breadth-first search order
$$(((((A_{ij}B^{i}_{kl})C^j_{mn})D^k) E^l) F^m) G^n$$
Each step is a contraction of two tensors, requiring the number of multiplications which is the product of dimensions of all indices occurring in either tensor. IE, cost of computing $Y_i^l=X_{ijk}Y^{jkl}$ is $d_i\times d_j\times d_k\times d_l$.
Total cost of computing this sum in breadth-first search order is
$$d_j\times d_i\times d_k \times d_l +d_k\times d_l\times d_j\times d_m\times d_n+d_k\times d_l \times d_m \times d_n + d_l \times d_m \times d_n +d_m\times d_n + d_n$$
For a tensor network with $n$ tensors, forming a complete binary tree of depth $D$, given a list of dimensions $\{d_e\}$, what is the complexity of finding an order which minimizes total cost?
PS: if we allow leaves to have free indices (ie, dangling edges in this graph), this kind of representation is an instance of "Hierarchical Tucker Decomposition"
Solution
If you restrict yourself to linear contraction orders, i.e., you only contract tensors adjacent to the set of already-contracted ones, you can actually do this in polynomial time. I have recently proved this in my Master's thesis. The result follows from a fascinating yet non-trivial connection to database join ordering (I recommend these slides for join ordering). I also have a preprint, but the thesis explains it properly.
If you want general contraction orders, i.e., you contract any two adjacent tensors, this is still an open problem (I learned this from the author of Xu et. al; that is also the reason they optimized for a weaker cost function, i.e., minimizing the largest intermediate tensor). Actually, the situation is similar in join ordering: the problem for tree queries is still open (Moerkotte et al.).
However, you can provide near-optimal general contraction orders for trees. The key idea is to regard the linear order of the optimal algorithm as a chain and run the textbook cubic-time dynamic programming to build the optimal contraction tree given that order. This idea was first explored in Neumann et al. (linearized dynamic programming) and has been shown to provide near-optimal solutions. Beyond the optimality result, I empirically validated that this holds for tensor networks as well. Recently, Ibrahim et al. proposed the same approach, but their linearization, i.e., the permutation of the leaves, is chosen greedily. The advantage of having an optimal algorithm for the linearization is that you are at least optimal in the regime of linear orders, so you can provide robustness that a greedy algorithm cannot.
If you want general contraction orders, i.e., you contract any two adjacent tensors, this is still an open problem (I learned this from the author of Xu et. al; that is also the reason they optimized for a weaker cost function, i.e., minimizing the largest intermediate tensor). Actually, the situation is similar in join ordering: the problem for tree queries is still open (Moerkotte et al.).
However, you can provide near-optimal general contraction orders for trees. The key idea is to regard the linear order of the optimal algorithm as a chain and run the textbook cubic-time dynamic programming to build the optimal contraction tree given that order. This idea was first explored in Neumann et al. (linearized dynamic programming) and has been shown to provide near-optimal solutions. Beyond the optimality result, I empirically validated that this holds for tensor networks as well. Recently, Ibrahim et al. proposed the same approach, but their linearization, i.e., the permutation of the leaves, is chosen greedily. The advantage of having an optimal algorithm for the linearization is that you are at least optimal in the regime of linear orders, so you can provide robustness that a greedy algorithm cannot.
Context
StackExchange Computer Science Q#146680, answer score: 2
Revisions (0)
No revisions yet.