HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Connected component analysis

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
componentanalysisconnected

Problem

I finished a program to do connected component analysis using union - find algorithm. Is it true that the complexity of the code is \$O(N logN)\$, where \$N\$ is the total number of pixels (512x512 by example, not 512).

Is there any way to improve the performance?

import cv2
import numpy as np
import random

class QuickUnionUF:

    def __init__(self, N):
        self.id = range(N)
        self.sz = [0] * N

    @classmethod
    def fromimage(self, im):
        self.id = im
        self.sz = [0] * len(im)

    def root(self, i):
        while (i != self.id[i]):
            i = self.id[i]
        return i

    def getresult(self):
        result = [self.root(i) for i in self.id]
        return result

    def connected(self, p, q):
        return self.root(p) == self.root(q)

    def union(self, p, q):
        i = self.root(p)
        j = self.root(q)

        if (i == j):
            return
        if (self.sz[i]  100)
cv2.imwrite("result.png", out)

Solution

Is it true that the complexity of the code is O(NlogN), where N is the total number of pixels (512x512 by example, not 512).

You main loop in bwlabelis:

for i in range(M - 1):
    for j in range(N - 1):
        if (im[i][j] == im[i][j+1]):
            qf.union(i * N + j, i * N + j + 1)
        if (im[i + 1][j] == im[i][j]):
            qf.union(i * N + j, (i + 1) * N + j)


So you are calling qf.union \$O(N*M) = O(n)\$ times. There you call self.root twice:

def root(self, i):
    while (i != self.id[i]):
        i = self.id[i]
    return i


What's the worst case running time of this? At most \$O(n)\$, since self.id has \$n\$ values and there are no cycles. However, since you are comparing ranks in union:

if (self.sz[i] < self.sz[j]):
        self.id[i] = j
        self.sz[j] += self.sz[i]
    else:
        self.id[j] = i
        self.sz[j] += self.sz[i]


You will have \$O(log\ n)\$ paths to root, so the total running time is \$O(n\ log\ n)\$.


Is there any way to improve the performance?

Yes, path compression. You descended down two paths from p and q to i and j and it took \$O(log\ n)\$. At each step you can point a node to its root, instead of just the next element. You can do this recursively like so:

def root(self, i):
    if i == self.id[i]:
        return i
    self.id[i] = self.root(self.id[i])
    return self.id[i]


Recursion adds its own cost, though, so whether it's faster in practice would require testing. An alternative would be to add a second iterative compression function:

def compress(self, i, r):
    while i != r:
        i, self.id[i] = self.id[i], r


Now you are walking the path twice, which again may or may not be faster than recursion. You should only need to call this for one of the paths, though.

Code Snippets

for i in range(M - 1):
    for j in range(N - 1):
        if (im[i][j] == im[i][j+1]):
            qf.union(i * N + j, i * N + j + 1)
        if (im[i + 1][j] == im[i][j]):
            qf.union(i * N + j, (i + 1) * N + j)
def root(self, i):
    while (i != self.id[i]):
        i = self.id[i]
    return i
if (self.sz[i] < self.sz[j]):
        self.id[i] = j
        self.sz[j] += self.sz[i]
    else:
        self.id[j] = i
        self.sz[j] += self.sz[i]
def root(self, i):
    if i == self.id[i]:
        return i
    self.id[i] = self.root(self.id[i])
    return self.id[i]
def compress(self, i, r):
    while i != r:
        i, self.id[i] = self.id[i], r

Context

StackExchange Code Review Q#52216, answer score: 3

Revisions (0)

No revisions yet.