HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Counting islands in Boolean matrices

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
booleancountingislandsmatrices

Problem

Given an $n \times m$ Boolean matrix $\mathrm X$, let $0$ entries represent the sea and $1$ entries represent land. Define an island as vertically or horizontally (but not diagonally) adjacent $1$ entries.

The original question was to count the number of islands in a given matrix. The author described a recursive solution ($\mathcal{O}(nm)$ memory).

But I was unsuccessfully trying to find a streaming (left to right, and then down to the next row) solution that dynamically counts islands with $\mathcal{O}(m)$ or $\mathcal{O}(n)$ or $\mathcal{O}(n+m)$ memory (there are no limits for the time complexity). Is that possible? If not, how can I prove it?

A few examples of expected outputs for certain inputs for the count function:

$
count\begin{pmatrix}
010\\
111\\
010\\
\end{pmatrix} = 1; %
count\begin{pmatrix}
101\\
010\\
101\\
\end{pmatrix} = 5; %
count\begin{pmatrix}
111\\
101\\
111\\
\end{pmatrix} = 1;
$

$
count\begin{pmatrix}
1111100\\
1000101\\
1010001\\
1011111\\
\end{pmatrix} = 2$

$
count\begin{pmatrix}
101\\
111\\
\end{pmatrix} = 1$

Solution

Orlp gives a solution using $O(n)$ words of space, which are $O(n\log n)$ bits of space (assuming for simplicity that $n=m$). Conversely, it is easy to show that $\Omega(n)$ bits of space are needed by reducing set disjointness to your problem.

Suppose that Alice holds a binary vector $x_1,\ldots,x_n$ and Bob holds a binary vector $y_1,\ldots,y_n$, and they want to know whether there exists an index $i$ such that $x_i=y_i=1$. They run your algorithm for the $2\times(2n-1)$ matrix whose rows are $x_1,0,x_2,0,\ldots,0,x_n$ and $y_1,0,y_2,0,\ldots,0,y_n$. After the first row is read, Alice sends Bob $\sum_i x_i$ as well as the memory contents, so that Bob can complete the algorithm and compare $\sum_i (x_i+y_i)$ to the number of connected components. If the two numbers match, the two vectors are disjoint (there is no index $i$), and vice versa. Since any protocol for set disjointness needs $\Omega(n)$ bits (even if it can err with a small constant probability), we deduce an $\Omega(n)$ lower bound, which holds even for randomized protocols which are allowed to err with some small constant probability.

We can improve on Orlp's solution by using noncrossing partitions. We read the matrix row by row. For each row, we remember which 1s are connected via paths going through preceding rows. The corresponding partition is noncrossing, and so can be encoded using $O(n)$ bits (since noncrossing partitions are counted by Catalan numbers, whose growth rate is exponential rather than factorial). When reading the following row, we maintain this representing, and increase a counter whenever all ends of some part are not connected to the current row (the counter takes an additional $O(\log n)$ bits). As in Orlp's solution, we add a final dummy row of zeroes to finish processing the matrix. This solution uses $O(n)$ bits, which is asymptotically optimal given our lower bound.

Context

StackExchange Computer Science Q#74797, answer score: 6

Revisions (0)

No revisions yet.