patternMinor
Does a subcubic algorithm exist for the following problem?
Viewed 0 times
problemthesubcubicexistalgorithmfollowingfordoes
Problem
Given a symmetric real $n \times n$ matrix $A=(a_{ij})$, is there an algorithm which computes the sum $$\sum_{i,j,k}\max(a_{ij},a_{ik},a_{jk})$$ over all $1\leq i<j<k\leq n$ with time-complexity better than $O(n^3)$?
Solution
There exists a quite practical approach that works in $O(n^3 / w)$ time, where $w$ is the number
of bits in the processor word. The main idea is that you iterate over
the elements of the matrix one by one in the increasing order (break ties arbitrarily)
and "switch them on". Consider the moment when the largest element of
some triple $a_{ij}, a_{ik}, a_{jk}$ is switched on. For simplicity, let's assume
that the said element is $a_{ij}$. It is natural to add the value of the triple
to the answer now, when the last element is switched on. So we have to count
the number of possible $k$'s such that $a_{ik}$ and $a_{jk}$ are already switched on (that would be the number of triples, here $a_{ij}$ is the largest element, so they were completely switched on just now). Here we can speed up the naive $O(n)$ implementation by using bit optimisation.
For details you can refer to the following implementation in C++11 that
should work for $n \leqslant 5000$, $|a_{ij}| \leqslant 10^9$
(it is not very optimized; however,
it still beats naive summation for $n = 5000$ by large margin at least on my machine).
If you consider using bit optimisations cheating,
you can use four russians method to the same result here,
yielding a $O(n^3 / \log n)$ algorithm, which should be less practical
(because $w$ is pretty large on most modern hardware)
but is theoretically better.
Indeed, let's choose $b \approx \log_2 n$ and keep each
row of the matrix as an array of $\lceil \frac{n}{b} \rceil$ integers
from $0$ to $2^b - 1$, where the $i$-th number in the array
corresponds to the bits of
the row ranging from $ib$ inclusive to $\min(n, (i + 1)b)$ exclusive
in $0$-indexation. We can precalculate the scalar products of every two
such blocks in $O(2^{2b} b)$ time. Updating a position in the matrix
is fast because we are changing only one integer. To find the scalar product
of rows $i$ and $j$ just iterate over arrays corresponding to that rows,
look up scalar products of the corresponding blocks in the table and sum up the
obtained products.
The above paragraph assumes that operations with integers $\leqslant n$ take $O(1)$ time. It is quite common assumption, because it usually
does not actually change the comparative speed of the algorithms (for example,
if we don't use that assumption, brute force method actually works in
$O(n^3 \log n)$ time (here we measure time in bit operations) if $a_{ij}$ take integer values with absolute values at least up to $n^{\varepsilon}$ for some constant $\varepsilon > 0$
(and otherwise we can solve the problem with $O(n^{\varepsilon})$ matrix multiplications anyway); however the four russians method suggested
above uses $O(n^3 / \log n)$ operations with numbers of size $O(\log n)$ in that
case; so it makes $O(n^3)$ bit operations, which is still better than brute force despite the change of the model).
The question about existence of $O(n^{3 - \varepsilon})$ approach is still interesting, though.
Techniques (bit optimisations and four russians method) presented in this answer are by no means original and are presented here for completeness of the exposition. However, finding a way to apply them was not trivial.
of bits in the processor word. The main idea is that you iterate over
the elements of the matrix one by one in the increasing order (break ties arbitrarily)
and "switch them on". Consider the moment when the largest element of
some triple $a_{ij}, a_{ik}, a_{jk}$ is switched on. For simplicity, let's assume
that the said element is $a_{ij}$. It is natural to add the value of the triple
to the answer now, when the last element is switched on. So we have to count
the number of possible $k$'s such that $a_{ik}$ and $a_{jk}$ are already switched on (that would be the number of triples, here $a_{ij}$ is the largest element, so they were completely switched on just now). Here we can speed up the naive $O(n)$ implementation by using bit optimisation.
For details you can refer to the following implementation in C++11 that
should work for $n \leqslant 5000$, $|a_{ij}| \leqslant 10^9$
(it is not very optimized; however,
it still beats naive summation for $n = 5000$ by large margin at least on my machine).
// code is not very elegant,
// but should be understandable
// here the matrix a has dimensions n x n
// a has to be symmetric!
int64_t solve (int n, const vector> &a)
{
std::vector> mat
(n, boost::dynamic_bitset(n));
vector> order;
for (int j = 1; j &l, const pair &r)
{return a[l.first][l.second] < a[r.first][r.second];});
int64_t ans = 0;
for (const auto &position : order)
{
int i, j;
tie (i, j) = position;
mat[i][j] = mat[j][i] = 1;
// here it is important that conditions
// mat[i][i] = 0 and mat[j][j] = 0 always hold
ans += (mat[i] & mat[j]).count() * int64_t(a[i][j]);
}
return ans;
}If you consider using bit optimisations cheating,
you can use four russians method to the same result here,
yielding a $O(n^3 / \log n)$ algorithm, which should be less practical
(because $w$ is pretty large on most modern hardware)
but is theoretically better.
Indeed, let's choose $b \approx \log_2 n$ and keep each
row of the matrix as an array of $\lceil \frac{n}{b} \rceil$ integers
from $0$ to $2^b - 1$, where the $i$-th number in the array
corresponds to the bits of
the row ranging from $ib$ inclusive to $\min(n, (i + 1)b)$ exclusive
in $0$-indexation. We can precalculate the scalar products of every two
such blocks in $O(2^{2b} b)$ time. Updating a position in the matrix
is fast because we are changing only one integer. To find the scalar product
of rows $i$ and $j$ just iterate over arrays corresponding to that rows,
look up scalar products of the corresponding blocks in the table and sum up the
obtained products.
The above paragraph assumes that operations with integers $\leqslant n$ take $O(1)$ time. It is quite common assumption, because it usually
does not actually change the comparative speed of the algorithms (for example,
if we don't use that assumption, brute force method actually works in
$O(n^3 \log n)$ time (here we measure time in bit operations) if $a_{ij}$ take integer values with absolute values at least up to $n^{\varepsilon}$ for some constant $\varepsilon > 0$
(and otherwise we can solve the problem with $O(n^{\varepsilon})$ matrix multiplications anyway); however the four russians method suggested
above uses $O(n^3 / \log n)$ operations with numbers of size $O(\log n)$ in that
case; so it makes $O(n^3)$ bit operations, which is still better than brute force despite the change of the model).
The question about existence of $O(n^{3 - \varepsilon})$ approach is still interesting, though.
Techniques (bit optimisations and four russians method) presented in this answer are by no means original and are presented here for completeness of the exposition. However, finding a way to apply them was not trivial.
Code Snippets
// code is not very elegant,
// but should be understandable
// here the matrix a has dimensions n x n
// a has to be symmetric!
int64_t solve (int n, const vector<vector<int32_t>> &a)
{
std::vector<boost::dynamic_bitset<int64_t>> mat
(n, boost::dynamic_bitset<int64_t>(n));
vector<pair<int, int>> order;
for (int j = 1; j < n; j++)
for (int i = 0; i < j; i++)
order.emplace_back(i, j);
sort(order.begin(), order.end(),
[&] (const pair<int, int> &l, const pair<int, int> &r)
{return a[l.first][l.second] < a[r.first][r.second];});
int64_t ans = 0;
for (const auto &position : order)
{
int i, j;
tie (i, j) = position;
mat[i][j] = mat[j][i] = 1;
// here it is important that conditions
// mat[i][i] = 0 and mat[j][j] = 0 always hold
ans += (mat[i] & mat[j]).count() * int64_t(a[i][j]);
}
return ans;
}Context
StackExchange Computer Science Q#92643, answer score: 4
Revisions (0)
No revisions yet.