HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Get a list of indexes for list A based on criteria in list B in Python

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
criteriaindexesgetforbasedpythonlist

Problem

I have one list of data with duplicate IDs and a second list of data with attributes for the IDs.

>>>a
[1,1,1,1,2,2,3,3]
>>>b
['No','No','Yes','No','Yes','No','No','No']


I want to create a list (c) that holds indexes for a that match the following criteria:

1) Only one index per ID in a

2) Indexes for IDs that have a 'Yes' in b take precedent

3) For IDs that don't have a 'Yes' any index is fine.

Using the above data as an example, the output would look like:

>>>c
[2,4,6]


So, you can see the data more easily:

>>> for c,(x,y) in enumerate(zip(a,b)): print(c,x,y)
...
0 1 No
1 1 No
2 1 Yes # grab this index ID 1
3 1 No
4 2 Yes # grab this index for ID 2
5 2 No  
6 3 No  # grab this index, but either would have been fine for ID 3
7 3 No


Below is code that I've written. I am looking for a more efficient and Pythonic way to do it.

filterwhen='Yes'    
uni=set(a)
sortedMain=sorted([list((i,c)) for c,i in enumerate(a)]) # sort list a
order=[i.pop() for i in sortedMain]
sortedFilt=[b[i] for i in order] # sort list b to keep data matched
sortedMain=[x for i in sortedMain for x in i]

c=[]
for i in uni:
    dex=[c for c,x in enumerate(sortedMain) if x==i]
    test=list(map((lambda x: x.lower().strip()==filterwhen.lower().strip()),[sortedFilt[i] for i in dex]))
    if any(test)==True:
        selecteddex=test.index(True)
        c.append(dex[selecteddex])
    else: c.append(dex[0])

print(c)


a has over 58,000 elements and 26,000+ are unique. Due to all the iterations, it takes too long to run this.

Solution

Consider doing this only with a single for loop, i.e. zip the two
lists, sort them by IDs and Yes/No (Yes first so those end up in
front in each block), enumerate and then, while keeping track of the
current block ID, simply always take the first element from each block
and append it to c. E.g.:

a=[1,1,1,2,2,3,3,1]
b=['No','Yes','No','Yes','No','No','No','No']

d = list(enumerate(zip(a, b)))

def compare(x, y):
    same_ids = cmp(x[0], y[0])
    return -cmp(x[1], y[1]) if same_ids == 0 else same_ids

d.sort(cmp=compare, key=lambda x: x[1])

c = []
last = None
for i, (x, y) in d:
    if last != x or last == None:
        c.append(i)
    last = x


At the price of more memory consumption you could also just do a linear
scan and keep a dictionary around that records the best choice per ID so
far. E.g.:

from itertools import izip

a=[1,1,1,2,2,3,3,1]
b=['No','Yes','No','Yes','No','No','No','No']

known = {}
for i, (x, y) in enumerate(izip(a, b)):
    current = known.get(x)
    if not current or y == 'Yes' and current[1] == 'No':
        known[x] = (i, y)

c = [i for (i, _) in known.itervalues()]

Code Snippets

a=[1,1,1,2,2,3,3,1]
b=['No','Yes','No','Yes','No','No','No','No']

d = list(enumerate(zip(a, b)))

def compare(x, y):
    same_ids = cmp(x[0], y[0])
    return -cmp(x[1], y[1]) if same_ids == 0 else same_ids

d.sort(cmp=compare, key=lambda x: x[1])

c = []
last = None
for i, (x, y) in d:
    if last != x or last == None:
        c.append(i)
    last = x
from itertools import izip

a=[1,1,1,2,2,3,3,1]
b=['No','Yes','No','Yes','No','No','No','No']

known = {}
for i, (x, y) in enumerate(izip(a, b)):
    current = known.get(x)
    if not current or y == 'Yes' and current[1] == 'No':
        known[x] = (i, y)

c = [i for (i, _) in known.itervalues()]

Context

StackExchange Code Review Q#135889, answer score: 3

Revisions (0)

No revisions yet.