patternpythonMinor
Find Row Pairs that are never both 1
Viewed 0 times
neverareboththatfindrowpairs
Problem
Given
sample_size many arrays of length dim. For each index from 1 to dim, i want to give a list of indices, where no sample contained in both indexes 1s. The code is really slow and i want to improve the performance.# Test set
dim = 10000
sample_size = 1000
from random import *
randBinList = lambda n: [randint(0,1) for b in range(1,n+1)]
samples = [randBinList(dim) for s in xrange(sample_size)]
# Find all variables that are not both 1 in one sample
node_neighbours = {}
for node in xrange(1,dim+1):
node_neighbours[node] = []
for node2 in xrange(1,dim+1):
if node != node2:
test_failed = False
for s in samples:
v1 = s[node-1]
v2 = s[node2-1]
if v1 == 1 and v2 == 1:
test_failed = True
break
if not test_failed:
node_neighbours[node].append(node2)
#print node, ":", node_neighbours[node]Solution
First off, I don't understand why you so much time dealing with off-by-one indices that you created in the first place:
You could simply write
You can also take advantage of the fact that
Other tiny improvements includes removing the
But a bigger improvement can be achieved using the C core of Python instead of these 2 for loops. Because they perform exactly what
Also note that all the provided snippets seems to take for granted that lists for each nodes has already been populated in
You should also wrap this code into a function to better separate the "initialization" or generation of test data from the actual computation. And testing this function should be done in an
for node in xrange(1,dim+1):
for node2 in xrange(1,dim+1):
for s in samples:
v1 = s[node-1]
v2 = s[node2-1]
if v1 == 1 and v2 == 1:You could simply write
xrange(dim) and s[node] and s[node2]. If you want a 1-based indexing when storing the results, you can still use node + 1 and node2 + 1 when appending the results.You can also take advantage of the fact that
node being a neighbour of node2 will imply that node2 will be a neighbour of node and cut down half of your computations (and a test as well):for node in xrange(dim):
for node2 in xrange(node + 1, dim):
test_failed = False
for s in samples:
v1 = s[node]
v2 = s[node2]
if v1 == 1 and v2 == 1:
test_failed = True
break
if not test_failed:
node_neighbours[node + 1].append(node2 + 1)
node_neighbours[node2 + 1].append(node + 1)Other tiny improvements includes removing the
test_failed flag by using the for ... else construct and testing equality with the extended comparisons:for node in xrange(dim):
for node2 in xrange(node + 1, dim):
for s in samples:
v1 = s[node]
v2 = s[node2]
if v1 == v2 == 1:
break
else:
node_neighbours[node + 1].append(node2 + 1)
node_neighbours[node2 + 1].append(node + 1)But a bigger improvement can be achieved using the C core of Python instead of these 2 for loops. Because they perform exactly what
itertools.combinations does:for node1, node2 in itertools.combinations(xrange(dim), 2):
for s in samples:
if s[node1] == s[node2] == 1:
break
else:
node_neighbours[node1 + 1].append(node2 + 1)
node_neighbours[node2 + 1].append(node1 + 1)Also note that all the provided snippets seems to take for granted that lists for each nodes has already been populated in
node_neighbours. This is easily achieved using collections.defaultdict as such:node_neighbours = defaultdict(list)You should also wrap this code into a function to better separate the "initialization" or generation of test data from the actual computation. And testing this function should be done in an
if __name__ == '__main__': clause:import itertools
from collections import defaultdict
def pairs_never_both_one(samples, dim=None):
if dim is None:
dim = len(samples[0])
node_neighbours = defaultdict(list)
for node1, node2 in itertools.combinations(xrange(dim), 2):
for s in samples:
if s[node1] == s[node2] == 1:
break
else:
node_neighbours[node1 + 1].append(node2 + 1)
node_neighbours[node2 + 1].append(node1 + 1)
return node_neighbours
if __name__ == '__main__':
from random import randint
dim = 10000
sample_size = 1000
samples = [[randint(0, 1) for _ in xrange(dim)] for _ in xrange(sample_size)]
print pairs_never_both_one(samples, dim)Code Snippets
for node in xrange(1,dim+1):
for node2 in xrange(1,dim+1):
for s in samples:
v1 = s[node-1]
v2 = s[node2-1]
if v1 == 1 and v2 == 1:for node in xrange(dim):
for node2 in xrange(node + 1, dim):
test_failed = False
for s in samples:
v1 = s[node]
v2 = s[node2]
if v1 == 1 and v2 == 1:
test_failed = True
break
if not test_failed:
node_neighbours[node + 1].append(node2 + 1)
node_neighbours[node2 + 1].append(node + 1)for node in xrange(dim):
for node2 in xrange(node + 1, dim):
for s in samples:
v1 = s[node]
v2 = s[node2]
if v1 == v2 == 1:
break
else:
node_neighbours[node + 1].append(node2 + 1)
node_neighbours[node2 + 1].append(node + 1)for node1, node2 in itertools.combinations(xrange(dim), 2):
for s in samples:
if s[node1] == s[node2] == 1:
break
else:
node_neighbours[node1 + 1].append(node2 + 1)
node_neighbours[node2 + 1].append(node1 + 1)node_neighbours = defaultdict(list)Context
StackExchange Code Review Q#147805, answer score: 2
Revisions (0)
No revisions yet.