patternpythonMinor
Get a list of indexes for list A based on criteria in list B in Python
Viewed 0 times
criteriaindexesgetforbasedpythonlist
Problem
I have one list of data with duplicate IDs and a second list of data with attributes for the IDs.
I want to create a list (
1) Only one index per ID in
2) Indexes for IDs that have a 'Yes' in
3) For IDs that don't have a 'Yes' any index is fine.
Using the above data as an example, the output would look like:
So, you can see the data more easily:
Below is code that I've written. I am looking for a more efficient and Pythonic way to do it.
>>>a
[1,1,1,1,2,2,3,3]
>>>b
['No','No','Yes','No','Yes','No','No','No']I want to create a list (
c) that holds indexes for a that match the following criteria:1) Only one index per ID in
a2) Indexes for IDs that have a 'Yes' in
b take precedent3) For IDs that don't have a 'Yes' any index is fine.
Using the above data as an example, the output would look like:
>>>c
[2,4,6]So, you can see the data more easily:
>>> for c,(x,y) in enumerate(zip(a,b)): print(c,x,y)
...
0 1 No
1 1 No
2 1 Yes # grab this index ID 1
3 1 No
4 2 Yes # grab this index for ID 2
5 2 No
6 3 No # grab this index, but either would have been fine for ID 3
7 3 NoBelow is code that I've written. I am looking for a more efficient and Pythonic way to do it.
filterwhen='Yes'
uni=set(a)
sortedMain=sorted([list((i,c)) for c,i in enumerate(a)]) # sort list a
order=[i.pop() for i in sortedMain]
sortedFilt=[b[i] for i in order] # sort list b to keep data matched
sortedMain=[x for i in sortedMain for x in i]
c=[]
for i in uni:
dex=[c for c,x in enumerate(sortedMain) if x==i]
test=list(map((lambda x: x.lower().strip()==filterwhen.lower().strip()),[sortedFilt[i] for i in dex]))
if any(test)==True:
selecteddex=test.index(True)
c.append(dex[selecteddex])
else: c.append(dex[0])
print(c)a has over 58,000 elements and 26,000+ are unique. Due to all the iterations, it takes too long to run this.Solution
Consider doing this only with a single
lists, sort them by IDs and
front in each block),
current block ID, simply always take the first element from each block
and append it to
At the price of more memory consumption you could also just do a linear
scan and keep a dictionary around that records the best choice per ID so
far. E.g.:
for loop, i.e. zip the twolists, sort them by IDs and
Yes/No (Yes first so those end up infront in each block),
enumerate and then, while keeping track of thecurrent block ID, simply always take the first element from each block
and append it to
c. E.g.:a=[1,1,1,2,2,3,3,1]
b=['No','Yes','No','Yes','No','No','No','No']
d = list(enumerate(zip(a, b)))
def compare(x, y):
same_ids = cmp(x[0], y[0])
return -cmp(x[1], y[1]) if same_ids == 0 else same_ids
d.sort(cmp=compare, key=lambda x: x[1])
c = []
last = None
for i, (x, y) in d:
if last != x or last == None:
c.append(i)
last = xAt the price of more memory consumption you could also just do a linear
scan and keep a dictionary around that records the best choice per ID so
far. E.g.:
from itertools import izip
a=[1,1,1,2,2,3,3,1]
b=['No','Yes','No','Yes','No','No','No','No']
known = {}
for i, (x, y) in enumerate(izip(a, b)):
current = known.get(x)
if not current or y == 'Yes' and current[1] == 'No':
known[x] = (i, y)
c = [i for (i, _) in known.itervalues()]Code Snippets
a=[1,1,1,2,2,3,3,1]
b=['No','Yes','No','Yes','No','No','No','No']
d = list(enumerate(zip(a, b)))
def compare(x, y):
same_ids = cmp(x[0], y[0])
return -cmp(x[1], y[1]) if same_ids == 0 else same_ids
d.sort(cmp=compare, key=lambda x: x[1])
c = []
last = None
for i, (x, y) in d:
if last != x or last == None:
c.append(i)
last = xfrom itertools import izip
a=[1,1,1,2,2,3,3,1]
b=['No','Yes','No','Yes','No','No','No','No']
known = {}
for i, (x, y) in enumerate(izip(a, b)):
current = known.get(x)
if not current or y == 'Yes' and current[1] == 'No':
known[x] = (i, y)
c = [i for (i, _) in known.itervalues()]Context
StackExchange Code Review Q#135889, answer score: 3
Revisions (0)
No revisions yet.