HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Find Row Pairs that are never both 1

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
neverareboththatfindrowpairs

Problem

Given sample_size many arrays of length dim. For each index from 1 to dim, i want to give a list of indices, where no sample contained in both indexes 1s. The code is really slow and i want to improve the performance.

# Test set

dim = 10000
sample_size = 1000

from random import *

randBinList = lambda n: [randint(0,1) for b in range(1,n+1)]

samples = [randBinList(dim) for s in xrange(sample_size)]

# Find all variables that are not both 1 in one sample

node_neighbours = {}
for node in xrange(1,dim+1):
    node_neighbours[node] = []
    for node2 in xrange(1,dim+1):
        if node != node2:
            test_failed = False
            for s in samples:
                v1 = s[node-1]
                v2 = s[node2-1]
                if v1 == 1 and v2 == 1:
                    test_failed = True
                    break
            if not test_failed:
                node_neighbours[node].append(node2)
    #print node, ":", node_neighbours[node]

Solution

First off, I don't understand why you so much time dealing with off-by-one indices that you created in the first place:

for node in xrange(1,dim+1):
    for node2 in xrange(1,dim+1):
        for s in samples:
            v1 = s[node-1]
            v2 = s[node2-1]
            if v1 == 1 and v2 == 1:


You could simply write xrange(dim) and s[node] and s[node2]. If you want a 1-based indexing when storing the results, you can still use node + 1 and node2 + 1 when appending the results.

You can also take advantage of the fact that node being a neighbour of node2 will imply that node2 will be a neighbour of node and cut down half of your computations (and a test as well):

for node in xrange(dim):
    for node2 in xrange(node + 1, dim):
        test_failed = False
        for s in samples:
            v1 = s[node]
            v2 = s[node2]
            if v1 == 1 and v2 == 1:
                test_failed = True
                break
        if not test_failed:
            node_neighbours[node + 1].append(node2 + 1)
            node_neighbours[node2 + 1].append(node + 1)


Other tiny improvements includes removing the test_failed flag by using the for ... else construct and testing equality with the extended comparisons:

for node in xrange(dim):
    for node2 in xrange(node + 1, dim):
        for s in samples:
            v1 = s[node]
            v2 = s[node2]
            if v1 == v2 == 1:
                break
        else:
            node_neighbours[node + 1].append(node2 + 1)
            node_neighbours[node2 + 1].append(node + 1)


But a bigger improvement can be achieved using the C core of Python instead of these 2 for loops. Because they perform exactly what itertools.combinations does:

for node1, node2 in itertools.combinations(xrange(dim), 2):
    for s in samples:
        if s[node1] == s[node2] == 1:
            break
    else:
        node_neighbours[node1 + 1].append(node2 + 1)
        node_neighbours[node2 + 1].append(node1 + 1)


Also note that all the provided snippets seems to take for granted that lists for each nodes has already been populated in node_neighbours. This is easily achieved using collections.defaultdict as such:

node_neighbours = defaultdict(list)


You should also wrap this code into a function to better separate the "initialization" or generation of test data from the actual computation. And testing this function should be done in an if __name__ == '__main__': clause:

import itertools
from collections import defaultdict

def pairs_never_both_one(samples, dim=None):
    if dim is None:
        dim = len(samples[0])

    node_neighbours = defaultdict(list)
    for node1, node2 in itertools.combinations(xrange(dim), 2):
        for s in samples:
            if s[node1] == s[node2] == 1:
                break
        else:
            node_neighbours[node1 + 1].append(node2 + 1)
            node_neighbours[node2 + 1].append(node1 + 1)

    return node_neighbours

if __name__ == '__main__':
    from random import randint

    dim = 10000
    sample_size = 1000
    samples = [[randint(0, 1) for _ in xrange(dim)] for _ in xrange(sample_size)]
    print pairs_never_both_one(samples, dim)

Code Snippets

for node in xrange(1,dim+1):
    for node2 in xrange(1,dim+1):
        for s in samples:
            v1 = s[node-1]
            v2 = s[node2-1]
            if v1 == 1 and v2 == 1:
for node in xrange(dim):
    for node2 in xrange(node + 1, dim):
        test_failed = False
        for s in samples:
            v1 = s[node]
            v2 = s[node2]
            if v1 == 1 and v2 == 1:
                test_failed = True
                break
        if not test_failed:
            node_neighbours[node + 1].append(node2 + 1)
            node_neighbours[node2 + 1].append(node + 1)
for node in xrange(dim):
    for node2 in xrange(node + 1, dim):
        for s in samples:
            v1 = s[node]
            v2 = s[node2]
            if v1 == v2 == 1:
                break
        else:
            node_neighbours[node + 1].append(node2 + 1)
            node_neighbours[node2 + 1].append(node + 1)
for node1, node2 in itertools.combinations(xrange(dim), 2):
    for s in samples:
        if s[node1] == s[node2] == 1:
            break
    else:
        node_neighbours[node1 + 1].append(node2 + 1)
        node_neighbours[node2 + 1].append(node1 + 1)
node_neighbours = defaultdict(list)

Context

StackExchange Code Review Q#147805, answer score: 2

Revisions (0)

No revisions yet.