patternMinor
Concatenate multiple integer arrays such that number of inversions in resulting array is minimal
Viewed 0 times
sucharraysnumberinversionsarrayconcatenateresultingthatmultipleminimal
Problem
Given
Find a way to concatenate the arrays in such a way that the number of inversions is minimum
An inversion for an array Arr can be defined as,
a pair of indices
I tried to concatenate the arrays on the basis of sum, such that the one with minimal sum goes first and one with maximal sum goes last.
It didn't work out though.
Example:-
N arrays of variable length.Find a way to concatenate the arrays in such a way that the number of inversions is minimum
An inversion for an array Arr can be defined as,
a pair of indices
(i,j) such that,i != j
i Arr[j]I tried to concatenate the arrays on the basis of sum, such that the one with minimal sum goes first and one with maximal sum goes last.
It didn't work out though.
Example:-
[14, 18, 18, 20, 16, 6, 11] SUM:- 103
[2, 4, 11, 40, 20, 14, 19] SUM:- 110
[14, 18, 18, 20, 16, 6, 11, 2, 4, 11, 40, 20, 14, 19] Inversions:- 42
[2, 4, 11, 40, 20, 14, 19, 14, 18, 18, 20, 16, 6, 11] Inversions:- 40Solution
As an exercise to the reader, two lemmas:
Lemma 1. If $a, b$ are strings of symbols with a partial order, and
$\text{numinv}$ counts the number of inversions in a string of symbols then
$$\text{numinv}(ab) = \text{numinv}(a) + \text{numinv}(b) + \text{crossinv}(a, b)$$
where $$\text{crossinv}(a, b) = |\{(i,j) \in \{1,\dots,|a|\}\times\{1,\dots,|b|\} \mid a_i > b_j\}|.$$
Lemma 2. $\text{crossinv}$ has the distributive property over
concatenation, that is for strings $a, b, c$ we have
\begin{align}
\text{crossinv}(a, bc) &= \text{crossinv}(a, b) + \text{crossinv}(a, c)\\
\text{crossinv}(ab, c) &= \text{crossinv}(a, c) + \text{crossinv}(b, c).
\end{align}
You can use the above two lemmas repeatedly to generalize, and see that for a tuple of strings $(x_1, \dots, x_n)$
$$\text{numinv}(x_1x_2\dots x_n) = \sum_{i=1}^n \text{numinv}(x_i) + \sum_{i=1}^n\sum_{j=i+1}^n\text{crossinv}(x_i, x_j).$$
Since the first sum is constant for any re-ordering of $x$, we can ignore it for our problem. In fact, if we sort the symbols inside each $x_i$ you can compute $\text{crossinv}(x_i, x_j)$ in $O(|x_i| + |x_j|)$ time. So we only need to minimize the second sum by re-ordering $x$.
However, I do believe that this is a difficult problem to solve optimally. A greedy swapping approach won't work. As a counter-example (I apologize for the somewhat length counterexample, it's just one I found during experiments with randomly generated arrays) consider the following array of arrays:
$$(39, 23, 9, 31, 1, 44), (35, 41, 1, 49, 23, 38, 4, 11), (13, 10, 17, 5, 38, 50, 46), (28, 21), (45, 46, 33)$$
This has $102$ inversions when concatenated. The unique optimal concatenation order is
$$(28, 21), (39, 23, 9, 31, 1, 44), (35, 41, 1, 49, 23, 38, 4, 11), (13, 10, 17, 5, 38, 50, 46), (45, 46, 33)$$
with $100$ inversions. However this can't be generated with a simple swap. I've found similar counterexamples for single-symbol substring rotations.
A simple greedy growing approach won't work either. $(41, 3, 31), (16, 11, 47)$ is uniquely optimally ordered, but if we extend this with $(30, 20)$ the unique optimal ordering becomes
$$(16, 11, 47), (30, 20), (41, 3, 31)$$
Which is in no way a simple extension of the previous.
To go even further, Tim shows in this answer that if you view $\text{crossinv}$ as a black box function, and you don't exploit any of its special properties, then this problem is NP-complete and you're doomed to fail. So any efficient optimal algorithm must use some special property of $\text{crossinv}$.
For very large amount of arrays you will probably see the best real-world results using a non-deterministic technique such as genetic evolution to optimize the number of inversions, although this wouldn't guarantee optimality. Or perhaps I'm overestimating this problem's difficulty and someone comes along with a better answer.
Lemma 1. If $a, b$ are strings of symbols with a partial order, and
$\text{numinv}$ counts the number of inversions in a string of symbols then
$$\text{numinv}(ab) = \text{numinv}(a) + \text{numinv}(b) + \text{crossinv}(a, b)$$
where $$\text{crossinv}(a, b) = |\{(i,j) \in \{1,\dots,|a|\}\times\{1,\dots,|b|\} \mid a_i > b_j\}|.$$
Lemma 2. $\text{crossinv}$ has the distributive property over
concatenation, that is for strings $a, b, c$ we have
\begin{align}
\text{crossinv}(a, bc) &= \text{crossinv}(a, b) + \text{crossinv}(a, c)\\
\text{crossinv}(ab, c) &= \text{crossinv}(a, c) + \text{crossinv}(b, c).
\end{align}
You can use the above two lemmas repeatedly to generalize, and see that for a tuple of strings $(x_1, \dots, x_n)$
$$\text{numinv}(x_1x_2\dots x_n) = \sum_{i=1}^n \text{numinv}(x_i) + \sum_{i=1}^n\sum_{j=i+1}^n\text{crossinv}(x_i, x_j).$$
Since the first sum is constant for any re-ordering of $x$, we can ignore it for our problem. In fact, if we sort the symbols inside each $x_i$ you can compute $\text{crossinv}(x_i, x_j)$ in $O(|x_i| + |x_j|)$ time. So we only need to minimize the second sum by re-ordering $x$.
However, I do believe that this is a difficult problem to solve optimally. A greedy swapping approach won't work. As a counter-example (I apologize for the somewhat length counterexample, it's just one I found during experiments with randomly generated arrays) consider the following array of arrays:
$$(39, 23, 9, 31, 1, 44), (35, 41, 1, 49, 23, 38, 4, 11), (13, 10, 17, 5, 38, 50, 46), (28, 21), (45, 46, 33)$$
This has $102$ inversions when concatenated. The unique optimal concatenation order is
$$(28, 21), (39, 23, 9, 31, 1, 44), (35, 41, 1, 49, 23, 38, 4, 11), (13, 10, 17, 5, 38, 50, 46), (45, 46, 33)$$
with $100$ inversions. However this can't be generated with a simple swap. I've found similar counterexamples for single-symbol substring rotations.
A simple greedy growing approach won't work either. $(41, 3, 31), (16, 11, 47)$ is uniquely optimally ordered, but if we extend this with $(30, 20)$ the unique optimal ordering becomes
$$(16, 11, 47), (30, 20), (41, 3, 31)$$
Which is in no way a simple extension of the previous.
To go even further, Tim shows in this answer that if you view $\text{crossinv}$ as a black box function, and you don't exploit any of its special properties, then this problem is NP-complete and you're doomed to fail. So any efficient optimal algorithm must use some special property of $\text{crossinv}$.
For very large amount of arrays you will probably see the best real-world results using a non-deterministic technique such as genetic evolution to optimize the number of inversions, although this wouldn't guarantee optimality. Or perhaps I'm overestimating this problem's difficulty and someone comes along with a better answer.
Context
StackExchange Computer Science Q#115037, answer score: 4
Revisions (0)
No revisions yet.