patternpythonMinor
Typeahead autocomplete functionality challenge
Viewed 0 times
functionalitytypeaheadautocompletechallenge
Problem
I'm working on a Talent Buddy challenge . In this challenge you have to design the autocomplete functionality of Twitter. You are given a list of queries and possible usernames. You have to return the most suited username based on the query. This whole functionality should run under 2 seconds for arrays which do not exceed 100.000 entries. Below you can find my attempt. On my computer it runs the biggest list in 0.517 seconds. But apperently it is not under 2 sec on their system. In the code bellow I did not add the big list, because it will be too much text for a question. Therefore I included the example lists.
My strategy is to divide the usernames into 26 lists (1 list for every letter in the alphabet). So this means that for each query, I do not have to go over all the user names, but only those starting with the same letter. For the rest I optimized my code with best practice rules.
Does anyone know how to make this faster?
```
usernames= ["james", "jBlank"]
queries= ["j", "jm", "jbl", "JB"]
import cProfile
def do_cprofile(func):
def profiled_func(*args, **kwargs):
profile = cProfile.Profile()
try:
profile.enable()
result = func(*args, **kwargs)
profile.disable()
return result
finally:
profile.print_stats()
return profiled_func
@do_cprofile
def typeahead(usernames, queries):
amount_of_queries = len(queries)
queries_lower = [query.lower() for query in queries]
username_lower = [item.lower() for item in usernames]
username_lower = sorted(username_lower)
querylen = [len(query) for query in queries]
results = [-1]*amount_of_queries
alphabet = [[] for x in xrange(26)]
myDict = {'a':0,'b':1,'c':2,'d':3,'e':4,'f':5,'g':6,'h':7,'i':8,'j':9,'k':10,'l':11,'m':12,'n':13,'o':14,'p':15,'q':16,'r':17,'s':18,'t':19,'u':20,'v':21,'w':22,'x':23,'y':24,'z':25}
[alphabet[myDict[username[0]]].append(username) for username in username_lower]
My strategy is to divide the usernames into 26 lists (1 list for every letter in the alphabet). So this means that for each query, I do not have to go over all the user names, but only those starting with the same letter. For the rest I optimized my code with best practice rules.
Does anyone know how to make this faster?
```
usernames= ["james", "jBlank"]
queries= ["j", "jm", "jbl", "JB"]
import cProfile
def do_cprofile(func):
def profiled_func(*args, **kwargs):
profile = cProfile.Profile()
try:
profile.enable()
result = func(*args, **kwargs)
profile.disable()
return result
finally:
profile.print_stats()
return profiled_func
@do_cprofile
def typeahead(usernames, queries):
amount_of_queries = len(queries)
queries_lower = [query.lower() for query in queries]
username_lower = [item.lower() for item in usernames]
username_lower = sorted(username_lower)
querylen = [len(query) for query in queries]
results = [-1]*amount_of_queries
alphabet = [[] for x in xrange(26)]
myDict = {'a':0,'b':1,'c':2,'d':3,'e':4,'f':5,'g':6,'h':7,'i':8,'j':9,'k':10,'l':11,'m':12,'n':13,'o':14,'p':15,'q':16,'r':17,'s':18,'t':19,'u':20,'v':21,'w':22,'x':23,'y':24,'z':25}
[alphabet[myDict[username[0]]].append(username) for username in username_lower]
Solution
Organisation of the function
You are doing many things in the same function which make the whole thing a bit confusing. First thing to do is to split the two main parts of the functions : one does preprocessing on
Before we go any further, something to keep in mind
I am not quite sure that this is an issue but in your example, your print lower-case usernames even though usernames with their original case is printed on this original problem.
Making your map from letters to username a bit cleaner
At the moment, you always use
I'll assume we are using the
In order to integrate this part of the solution in your code, let's remove the duplicated code from what we have in the part handling the queries :
Replacing this with our
Please note that I took this chance to remove the list you were building with a useless list-comprehension.
Handling the queries
As you are about to iterate on queries anyway, there's no real need to build lists with information you will need. You can just iterate over
Also, you can initialise
Finally, we don't even need the indices as we can build
```
# Handling the queries
r
You are doing many things in the same function which make the whole thing a bit confusing. First thing to do is to split the two main parts of the functions : one does preprocessing on
usernames, one handles queries. One this is done, your code looks like :def typeahead(usernames, queries):
# Pre-processing on usernames
username_lower = [item.lower() for item in usernames]
username_lower = sorted(username_lower)
alphabet = [[] for x in xrange(26)]
myDict = {'a':0,'b':1,'c':2,'d':3,'e':4,'f':5,'g':6,'h':7,'i':8,'j':9,'k':10,'l':11,'m':12,'n':13,'o':14,'p':15,'q':16,'r':17,'s':18,'t':19,'u':20,'v':21,'w':22,'x':23,'y':24,'z':25}
[alphabet[myDict[username[0]]].append(username) for username in username_lower]
# Handling the queries
matches=[]
amount_of_queries = len(queries)
results = [-1]*amount_of_queries
queries_lower = [query.lower() for query in queries]
querylen = [len(query) for query in queries]
for q in xrange(amount_of_queries):
matches[:] = []
quer = queries_lower[q]
if querylen[q]==1:
results[q] = alphabet[myDict[quer[0]]][0]
else:
matches = [item for item in alphabet[myDict[quer[0]]] if quer==item[:querylen[q]]]
if len(matches)>=1:
results[q]= matches[0]
print '\n'.join(map(str, results))Before we go any further, something to keep in mind
I am not quite sure that this is an issue but in your example, your print lower-case usernames even though usernames with their original case is printed on this original problem.
Making your map from letters to username a bit cleaner
At the moment, you always use
alphabet and myDict together. This is a good sign that something is wrong. (Also, if you really want to map letters to their index, you don't need dictionnary with hardcoded value, you just need arithmetic, ord/chr). If what you want is just to match letters with the list of corresponding usernames, you can use normal dictionnary and check if items are already present, normal dictionnary and fill with empty list from the very beginning or you could play it smart and use Python good stuff as defaultdict and setdefault :from collections import defaultdict
mapping = defaultdict(list)
mapping2 = dict()
for username in usernames:
low = username.lower()
mapping[low[0]].append(low)
mapping2.setdefault(low[0], []).append(low)I'll assume we are using the
defaultdict option from now.In order to integrate this part of the solution in your code, let's remove the duplicated code from what we have in the part handling the queries :
for q in xrange(amount_of_queries):
matches[:] = []
quer = queries_lower[q]
candidates = alphabet[myDict[quer[0]]] ### THIS IS WHAT IS HAVE ADDED
if querylen[q]==1:
results[q] = candidates[0]
else:
matches = [item for item in candidates if quer==item[:querylen[q]]]
if len(matches)>=1:
results[q]= matches[0]Replacing this with our
defaultdict logic, we get :def typeahead(usernames, queries):
# Pre-processing on usernames
mapping = defaultdict(list)
for username in usernames:
mapping[username.lower()[0]].append(username.lower())
# Handling the queries
matches=[]
amount_of_queries = len(queries)
results = [-1]*amount_of_queries
queries_lower = [query.lower() for query in queries]
querylen = [len(query) for query in queries]
for q in xrange(amount_of_queries):
matches[:] = []
quer = queries_lower[q]
candidates = mapping[quer[0]]
if querylen[q]==1:
results[q] = candidates[0]
else:
matches = [item for item in candidates if quer==item[:querylen[q]]]
if len(matches)>=1:
results[q]= matches[0]
print '\n'.join(map(str, results))Please note that I took this chance to remove the list you were building with a useless list-comprehension.
Handling the queries
As you are about to iterate on queries anyway, there's no real need to build lists with information you will need. You can just iterate over
queries and that will do the trick.Also, you can initialise
matches inside the loop and use enumerate if you really need the index. At this point, the part of the code looks like :# Handling the queries
results = [-1]*len(queries)
for i,q in enumerate(queries):
matches=[]
low_quer = q.lower()
l = len(low_quer)
candidates = mapping[low_quer[0]]
if l==1:
results[i] = candidates[0]
else:
matches = [item for item in candidates if low_quer==item[:l]]
if len(matches)>=1:
results[i]= matches[0]Finally, we don't even need the indices as we can build
results as we go. Also, we don't need matches in that part of the function. Things get even more concise :```
# Handling the queries
r
Code Snippets
def typeahead(usernames, queries):
# Pre-processing on usernames
username_lower = [item.lower() for item in usernames]
username_lower = sorted(username_lower)
alphabet = [[] for x in xrange(26)]
myDict = {'a':0,'b':1,'c':2,'d':3,'e':4,'f':5,'g':6,'h':7,'i':8,'j':9,'k':10,'l':11,'m':12,'n':13,'o':14,'p':15,'q':16,'r':17,'s':18,'t':19,'u':20,'v':21,'w':22,'x':23,'y':24,'z':25}
[alphabet[myDict[username[0]]].append(username) for username in username_lower]
# Handling the queries
matches=[]
amount_of_queries = len(queries)
results = [-1]*amount_of_queries
queries_lower = [query.lower() for query in queries]
querylen = [len(query) for query in queries]
for q in xrange(amount_of_queries):
matches[:] = []
quer = queries_lower[q]
if querylen[q]==1:
results[q] = alphabet[myDict[quer[0]]][0]
else:
matches = [item for item in alphabet[myDict[quer[0]]] if quer==item[:querylen[q]]]
if len(matches)>=1:
results[q]= matches[0]
print '\n'.join(map(str, results))from collections import defaultdict
mapping = defaultdict(list)
mapping2 = dict()
for username in usernames:
low = username.lower()
mapping[low[0]].append(low)
mapping2.setdefault(low[0], []).append(low)for q in xrange(amount_of_queries):
matches[:] = []
quer = queries_lower[q]
candidates = alphabet[myDict[quer[0]]] ### THIS IS WHAT IS HAVE ADDED
if querylen[q]==1:
results[q] = candidates[0]
else:
matches = [item for item in candidates if quer==item[:querylen[q]]]
if len(matches)>=1:
results[q]= matches[0]def typeahead(usernames, queries):
# Pre-processing on usernames
mapping = defaultdict(list)
for username in usernames:
mapping[username.lower()[0]].append(username.lower())
# Handling the queries
matches=[]
amount_of_queries = len(queries)
results = [-1]*amount_of_queries
queries_lower = [query.lower() for query in queries]
querylen = [len(query) for query in queries]
for q in xrange(amount_of_queries):
matches[:] = []
quer = queries_lower[q]
candidates = mapping[quer[0]]
if querylen[q]==1:
results[q] = candidates[0]
else:
matches = [item for item in candidates if quer==item[:querylen[q]]]
if len(matches)>=1:
results[q]= matches[0]
print '\n'.join(map(str, results))# Handling the queries
results = [-1]*len(queries)
for i,q in enumerate(queries):
matches=[]
low_quer = q.lower()
l = len(low_quer)
candidates = mapping[low_quer[0]]
if l==1:
results[i] = candidates[0]
else:
matches = [item for item in candidates if low_quer==item[:l]]
if len(matches)>=1:
results[i]= matches[0]Context
StackExchange Code Review Q#54480, answer score: 4
Revisions (0)
No revisions yet.