patternMinor
Find the maximizing row-column matches in a matrix
Viewed 0 times
matchesthecolumnmaximizingfindrowmatrix
Problem
I have a set of R x C matrices similar to the following (can be much longer):
C1 C2 C3 C4 C5 C6
R1 0.32 0.81 NA NA NA NA
R2 0.90 -0.44 0.95 NA NA NA
R3 NA 0.93 0.86 0.24 NA NA
R4 NA NA 0.70 0.74 0.15 0.05
R5 NA NA NA NA NA NA
I need to find the sequence of row-columns matchings that provides the overall maximum values (when summed). Each row can be associated to one and only one column, and "cross" matchings are not allowed, i.e. if R2 was matched with C4, a following row R>2 cannot be matched with a previous column C
[R2:C1 = 0.90] + [R3:C2 = 0.93] + [R4:C4 = 0.74] = 2.57
which is higher than alternative solutions that may contain the highest values in the matrix, such as:
[R2:C2 = 0.81] + [R2:C3 = 0.95] + [R4:C4 = 0.74] = 2.50
The only solution I came up with is to create all possible RxC matrices populated by 1 and 0 where the marginal of each row and each column is not greater than 1, and where the no-crossing rule is respected.
Then by multiplying such matrices with the original ones, I obtain new matrices containing ONLY the values in the solution, which can be easily summed.
Example of the solution above:
C1 C2 C3 C4 C5 C6 | row
R1 0 0 0 0 0 0 | 0
R2 1 0 0 0 0 0 | 1
R3 0 1 0 0 0 0 | 1
R4 0 0 0 1 0 0 | 1
R5 0 0 0 0 0 0 | 0
_______________________________
col 1 1 0 1 0 0
This is a valid matrix as all the marginals are
result = matrix(R, C, 0) //allocate response matrix RxC with all elements = 0
for (i from 1 to R){
for (j from 1 to C){
part1 = A[i-1,j]
part2 = array[j] //allocates array of length j
for(jPrime in 1:j){
part2[jPrime] = M[i,jPrime] + A[i-1,jPrime-1]
}
A[i,j] = max(part1, max(part2))
}
}
This returns the following A matrix:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.32 0.81 0.81 0.
C1 C2 C3 C4 C5 C6
R1 0.32 0.81 NA NA NA NA
R2 0.90 -0.44 0.95 NA NA NA
R3 NA 0.93 0.86 0.24 NA NA
R4 NA NA 0.70 0.74 0.15 0.05
R5 NA NA NA NA NA NA
I need to find the sequence of row-columns matchings that provides the overall maximum values (when summed). Each row can be associated to one and only one column, and "cross" matchings are not allowed, i.e. if R2 was matched with C4, a following row R>2 cannot be matched with a previous column C
[R2:C1 = 0.90] + [R3:C2 = 0.93] + [R4:C4 = 0.74] = 2.57
which is higher than alternative solutions that may contain the highest values in the matrix, such as:
[R2:C2 = 0.81] + [R2:C3 = 0.95] + [R4:C4 = 0.74] = 2.50
The only solution I came up with is to create all possible RxC matrices populated by 1 and 0 where the marginal of each row and each column is not greater than 1, and where the no-crossing rule is respected.
Then by multiplying such matrices with the original ones, I obtain new matrices containing ONLY the values in the solution, which can be easily summed.
Example of the solution above:
C1 C2 C3 C4 C5 C6 | row
R1 0 0 0 0 0 0 | 0
R2 1 0 0 0 0 0 | 1
R3 0 1 0 0 0 0 | 1
R4 0 0 0 1 0 0 | 1
R5 0 0 0 0 0 0 | 0
_______________________________
col 1 1 0 1 0 0
This is a valid matrix as all the marginals are
result = matrix(R, C, 0) //allocate response matrix RxC with all elements = 0
for (i from 1 to R){
for (j from 1 to C){
part1 = A[i-1,j]
part2 = array[j] //allocates array of length j
for(jPrime in 1:j){
part2[jPrime] = M[i,jPrime] + A[i-1,jPrime-1]
}
A[i,j] = max(part1, max(part2))
}
}
This returns the following A matrix:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.32 0.81 0.81 0.
Solution
The problem can be solved with dynamic programming in $O(RC)$ time.
I'll start by showing how to solve it in $O(RC^2)$ time. Let $M[\cdot,\cdot]$ denote the input matrix. Let $A[i,j]$ denote the total value of the best matching using only rows $1..i$ and columns $1..j$. We will compute the entries of $A[\cdot,\cdot]$. There is a recursive relation for $A[\cdot,\cdot]$:
$$A[i,j] = \max(A[i-1,j],\max\{M[i,j'] + A[i-1,j'-1] : 1 \le j' \le j\}).$$
Here the $A[i-1,j]$ term corresponds to not including any match for row $i$, and the $\max\{M[i,j'] + A[i-1,j'-1] : 1 \le j' \le j\}$ corresponds to matching row $i$ to column $j'$.
You can fill in the entries of the $A$ array in a straightforward way in $O(RC^2)$ time, and the value of the best matching is given by $A[R,C]$. So, this gives an $O(RC^2)$ time algorithm.
You can speed this up to $O(RC)$ time by simultaneously computing another array $B[\cdot,\cdot]$, with entries defined as
$$B[i,j] = \max(A[i,1],A[i,2],\dots,A[i,j]).$$
Then all of the entries of $A$ and $B$ can be filled in, if you do it in the right order, in a total of $O(RC)$ time. This is linear in the size of the matrix, i.e., proportional to the number of entries in the input matrix. Since any algorithm will presumably have to examine all entries in the matrix, I think that means this algorithm is asymptotically optimal: no algorithm can run faster by more than a constant factor.
I'll start by showing how to solve it in $O(RC^2)$ time. Let $M[\cdot,\cdot]$ denote the input matrix. Let $A[i,j]$ denote the total value of the best matching using only rows $1..i$ and columns $1..j$. We will compute the entries of $A[\cdot,\cdot]$. There is a recursive relation for $A[\cdot,\cdot]$:
$$A[i,j] = \max(A[i-1,j],\max\{M[i,j'] + A[i-1,j'-1] : 1 \le j' \le j\}).$$
Here the $A[i-1,j]$ term corresponds to not including any match for row $i$, and the $\max\{M[i,j'] + A[i-1,j'-1] : 1 \le j' \le j\}$ corresponds to matching row $i$ to column $j'$.
You can fill in the entries of the $A$ array in a straightforward way in $O(RC^2)$ time, and the value of the best matching is given by $A[R,C]$. So, this gives an $O(RC^2)$ time algorithm.
You can speed this up to $O(RC)$ time by simultaneously computing another array $B[\cdot,\cdot]$, with entries defined as
$$B[i,j] = \max(A[i,1],A[i,2],\dots,A[i,j]).$$
Then all of the entries of $A$ and $B$ can be filled in, if you do it in the right order, in a total of $O(RC)$ time. This is linear in the size of the matrix, i.e., proportional to the number of entries in the input matrix. Since any algorithm will presumably have to examine all entries in the matrix, I think that means this algorithm is asymptotically optimal: no algorithm can run faster by more than a constant factor.
Context
StackExchange Computer Science Q#91502, answer score: 4
Revisions (0)
No revisions yet.