patternMinor
Find an optimal ordering
Viewed 0 times
orderingoptimalfind
Problem
I came across this problem and am struggling to find a way to approach it. Any thoughts would be greatly appreciated!
Suppose we are given a matrix $\{-1, 0, 1\}^{n\ \times\ k} $, for example,
$$\begin{bmatrix}
1 & 0 & 1 & 0 & -1 \\
-1 & 0 & 0 & 0 & 1 \\
0 & 1 & 1 & 0 & -1 \\
-1 & -1 & 0 & 1 & 1 \\
1 & 0 & 0 & 0 & -1
\end{bmatrix}$$
Without trying every single permutation, find an ordering of columns $c_i$ that maximises the number of rows for which the first non-zero element is $1$.
For the example above, one such ordering (it's not unique!) is $(c_3, c_4, c_1, c_2, c_5)$, i.e.,
$$\begin{bmatrix}
1 & 0 & 1 & 0 & -1 \\
0 & 0 & -1 & 0 & 1 \\
1 & 0 & 0 & 1 & -1 \\
0 & 1 & 1 & -1 & 1 \\
0 & 0 & 1 & 0 & -1
\end{bmatrix}$$
Here, for $4$ out of $5$ rows the first non-zero element is $1$.
Suppose we are given a matrix $\{-1, 0, 1\}^{n\ \times\ k} $, for example,
$$\begin{bmatrix}
1 & 0 & 1 & 0 & -1 \\
-1 & 0 & 0 & 0 & 1 \\
0 & 1 & 1 & 0 & -1 \\
-1 & -1 & 0 & 1 & 1 \\
1 & 0 & 0 & 0 & -1
\end{bmatrix}$$
Without trying every single permutation, find an ordering of columns $c_i$ that maximises the number of rows for which the first non-zero element is $1$.
For the example above, one such ordering (it's not unique!) is $(c_3, c_4, c_1, c_2, c_5)$, i.e.,
$$\begin{bmatrix}
1 & 0 & 1 & 0 & -1 \\
0 & 0 & -1 & 0 & 1 \\
1 & 0 & 0 & 1 & -1 \\
0 & 1 & 1 & -1 & 1 \\
0 & 0 & 1 & 0 & -1
\end{bmatrix}$$
Here, for $4$ out of $5$ rows the first non-zero element is $1$.
Solution
This problem, which I'll call CO for Column Ordering, is NP-hard. Here's a reduction from the NP-hard problem Vertex Cover (VC) to it:
Decision problem forms of VC and CO
Let the input VC instance be $(V, E, k)$. It represents the question: "Given the graph $(V, E)$, is it possible to choose a set of at most $k$ vertices from $V$ such that every edge in $E$ is incident on at least one chosen vertex?" We will construct an instance $(A, k')$ of CO that represents the question: "Given the matrix $A$ with elements in $\{-1, 0, 1\}$, is it possible to permute the columns of $A$ such that a 1 appears before a -1 on at least $k'$ rows?" These two problems are stated in decision problem form, whereby the answer to each is either YES or NO: formally speaking, it is this form of a problem that is NP-complete (or not). It is not too difficult to see that the more natural optimisation problem form stated in the OP's question is roughly equivalent in terms of complexity: binary search on the threshold parameter can be used to solve the optimisation problem using a decision problem solver, while a single invocation of an optimisation problem solver, followed by a single comparison, is enough to solve the decision problem.
Constructing an instance of CO from an instance of VC
Let $n=|V|$ and $m=|E|$. We will build a matrix $A$ with $(n+1)m + n$ rows and $n+1$ columns. The top $(n+1)m$ rows will be formed of $m$ blocks of $n+1$ rows each, with each block representing an edge that needs to be covered. The bottom $n$ rows contain vertex "flags", which will cause a column (corresponding to a vertex) to incur a fixed cost if it is included in the left-hand side of the CO solution (corresponding to a vertex being included in the vertex cover of the VC solution).
For each vertex $v_i$, create a column in which:
Create one more "fence" column that consists of $(n+1)m$ copies of -1, followed by $n$ copies of +1.
Finally, set the threshold $k'$ for the constructed CO instance: $(n+1)m + n - k$. In other words, we allow at most $k$ rows in which a -1 appears before a +1. Let's call this number of violating rows the "cost" of a CO solution.
Proof
The correspondence between a solution to the CO instance and a set of vertices in the original VC instance is: Every column to the left of the fence corresponds to a vertex that is in the set, and every column to the right of the fence corresponds to a vertex that is not.
Intuitively, the -1s in the top of the "fence" column force the selection of a subset of columns to be placed to its left that together contain +1s in all these positions -- corresponding to a subset of vertices that are incident on every edge. Each of these columns that appears to the left of the "fence" has a -1 on a distinct row somewhere in the bottom $n$ rows, incurring a cost of 1; the +1s in the bottom of the "fence" ensure that all columns placed to its right incur no such cost.
Clearly a VC solution using at most $k$ vertices yields a solution to the constructed CO instance with cost at most $k$: Just order the columns corresponding to vertices in the vertex cover arbitrarily, followed by the fence column, followed by all remaining columns in any order.
It remains to show that a solution to the CO instance with cost at most $k$ corresponds to a vertex cover with at most $k$ vertices.
Suppose to the contrary that there exists a solution to the CO instance with cost at most $k$ that leaves some row in the top $(n+1)m$ rows with a -1 before a +1. This row belongs to a block of $(n+1)$ rows corresponding to a particular edge $uv$. Every row in this block in the original instance $A$ is identical by construction; permuting columns may change these rows, but does not affect the fact that they are identical. Thus each of these $n+1$ identical rows has a -1 before a +1 in the solution, implying a cost of at least $n+1$. But $k \le n < n+1$: contradiction.
Since each of the $m$ blocks of rows in the top $(n+1)m$ rows have a +1 before a -1, each of the corresponding edges is covered by a vertex corresponding to a column to the left of the fence: that is, this subset of vertices constitutes a vertex cover. Since none of the top $(n+1)m$ rows have a -1 before a +1, the only place where cost can accrue in the solution is in the bottom $n$ rows, from columns placed to the left of the fence. Each such column has cost exactly 1, so given that the cost is at most $k$, there must be at most $k$ such columns, and hence at most $k$ vertices in the cover.
Finally, it's clear that the CO instance can be constructed in polynomial time from the VC instance, meaning that if a polynomial-time algorithm existed for solving CO, any VC instance could also be solved in polynomial time by first constructing a
Decision problem forms of VC and CO
Let the input VC instance be $(V, E, k)$. It represents the question: "Given the graph $(V, E)$, is it possible to choose a set of at most $k$ vertices from $V$ such that every edge in $E$ is incident on at least one chosen vertex?" We will construct an instance $(A, k')$ of CO that represents the question: "Given the matrix $A$ with elements in $\{-1, 0, 1\}$, is it possible to permute the columns of $A$ such that a 1 appears before a -1 on at least $k'$ rows?" These two problems are stated in decision problem form, whereby the answer to each is either YES or NO: formally speaking, it is this form of a problem that is NP-complete (or not). It is not too difficult to see that the more natural optimisation problem form stated in the OP's question is roughly equivalent in terms of complexity: binary search on the threshold parameter can be used to solve the optimisation problem using a decision problem solver, while a single invocation of an optimisation problem solver, followed by a single comparison, is enough to solve the decision problem.
Constructing an instance of CO from an instance of VC
Let $n=|V|$ and $m=|E|$. We will build a matrix $A$ with $(n+1)m + n$ rows and $n+1$ columns. The top $(n+1)m$ rows will be formed of $m$ blocks of $n+1$ rows each, with each block representing an edge that needs to be covered. The bottom $n$ rows contain vertex "flags", which will cause a column (corresponding to a vertex) to incur a fixed cost if it is included in the left-hand side of the CO solution (corresponding to a vertex being included in the vertex cover of the VC solution).
For each vertex $v_i$, create a column in which:
- among the top $(n+1)m$ rows, the $j$-th block of $n+1$ rows all contain a +1 when edge $e_j$ is incident on $v_i$, and 0 otherwise, and
- the bottom $n$ rows are all 0 except for the $i$-th, which is -1.
Create one more "fence" column that consists of $(n+1)m$ copies of -1, followed by $n$ copies of +1.
Finally, set the threshold $k'$ for the constructed CO instance: $(n+1)m + n - k$. In other words, we allow at most $k$ rows in which a -1 appears before a +1. Let's call this number of violating rows the "cost" of a CO solution.
Proof
The correspondence between a solution to the CO instance and a set of vertices in the original VC instance is: Every column to the left of the fence corresponds to a vertex that is in the set, and every column to the right of the fence corresponds to a vertex that is not.
Intuitively, the -1s in the top of the "fence" column force the selection of a subset of columns to be placed to its left that together contain +1s in all these positions -- corresponding to a subset of vertices that are incident on every edge. Each of these columns that appears to the left of the "fence" has a -1 on a distinct row somewhere in the bottom $n$ rows, incurring a cost of 1; the +1s in the bottom of the "fence" ensure that all columns placed to its right incur no such cost.
Clearly a VC solution using at most $k$ vertices yields a solution to the constructed CO instance with cost at most $k$: Just order the columns corresponding to vertices in the vertex cover arbitrarily, followed by the fence column, followed by all remaining columns in any order.
It remains to show that a solution to the CO instance with cost at most $k$ corresponds to a vertex cover with at most $k$ vertices.
Suppose to the contrary that there exists a solution to the CO instance with cost at most $k$ that leaves some row in the top $(n+1)m$ rows with a -1 before a +1. This row belongs to a block of $(n+1)$ rows corresponding to a particular edge $uv$. Every row in this block in the original instance $A$ is identical by construction; permuting columns may change these rows, but does not affect the fact that they are identical. Thus each of these $n+1$ identical rows has a -1 before a +1 in the solution, implying a cost of at least $n+1$. But $k \le n < n+1$: contradiction.
Since each of the $m$ blocks of rows in the top $(n+1)m$ rows have a +1 before a -1, each of the corresponding edges is covered by a vertex corresponding to a column to the left of the fence: that is, this subset of vertices constitutes a vertex cover. Since none of the top $(n+1)m$ rows have a -1 before a +1, the only place where cost can accrue in the solution is in the bottom $n$ rows, from columns placed to the left of the fence. Each such column has cost exactly 1, so given that the cost is at most $k$, there must be at most $k$ such columns, and hence at most $k$ vertices in the cover.
Finally, it's clear that the CO instance can be constructed in polynomial time from the VC instance, meaning that if a polynomial-time algorithm existed for solving CO, any VC instance could also be solved in polynomial time by first constructing a
Context
StackExchange Computer Science Q#105670, answer score: 4
Revisions (0)
No revisions yet.