HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Typeahead autocomplete functionality challenge

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
functionalitytypeaheadautocompletechallenge

Problem

I'm working on a Talent Buddy challenge . In this challenge you have to design the autocomplete functionality of Twitter. You are given a list of queries and possible usernames. You have to return the most suited username based on the query. This whole functionality should run under 2 seconds for arrays which do not exceed 100.000 entries. Below you can find my attempt. On my computer it runs the biggest list in 0.517 seconds. But apperently it is not under 2 sec on their system. In the code bellow I did not add the big list, because it will be too much text for a question. Therefore I included the example lists.

My strategy is to divide the usernames into 26 lists (1 list for every letter in the alphabet). So this means that for each query, I do not have to go over all the user names, but only those starting with the same letter. For the rest I optimized my code with best practice rules.

Does anyone know how to make this faster?

```
usernames= ["james", "jBlank"]
queries= ["j", "jm", "jbl", "JB"]

import cProfile

def do_cprofile(func):
def profiled_func(*args, **kwargs):
profile = cProfile.Profile()
try:
profile.enable()
result = func(*args, **kwargs)
profile.disable()
return result
finally:
profile.print_stats()
return profiled_func

@do_cprofile
def typeahead(usernames, queries):

amount_of_queries = len(queries)
queries_lower = [query.lower() for query in queries]
username_lower = [item.lower() for item in usernames]
username_lower = sorted(username_lower)
querylen = [len(query) for query in queries]

results = [-1]*amount_of_queries
alphabet = [[] for x in xrange(26)]
myDict = {'a':0,'b':1,'c':2,'d':3,'e':4,'f':5,'g':6,'h':7,'i':8,'j':9,'k':10,'l':11,'m':12,'n':13,'o':14,'p':15,'q':16,'r':17,'s':18,'t':19,'u':20,'v':21,'w':22,'x':23,'y':24,'z':25}

[alphabet[myDict[username[0]]].append(username) for username in username_lower]

Solution

Organisation of the function

You are doing many things in the same function which make the whole thing a bit confusing. First thing to do is to split the two main parts of the functions : one does preprocessing on usernames, one handles queries. One this is done, your code looks like :

def typeahead(usernames, queries):
    # Pre-processing on usernames
    username_lower = [item.lower() for item in usernames]
    username_lower = sorted(username_lower)

    alphabet = [[] for x in xrange(26)]
    myDict = {'a':0,'b':1,'c':2,'d':3,'e':4,'f':5,'g':6,'h':7,'i':8,'j':9,'k':10,'l':11,'m':12,'n':13,'o':14,'p':15,'q':16,'r':17,'s':18,'t':19,'u':20,'v':21,'w':22,'x':23,'y':24,'z':25}

    [alphabet[myDict[username[0]]].append(username) for username in username_lower]

    # Handling the queries
    matches=[]
    amount_of_queries = len(queries)
    results = [-1]*amount_of_queries
    queries_lower = [query.lower() for query in queries]
    querylen = [len(query) for query in queries]
    for q in xrange(amount_of_queries):
        matches[:] = []
        quer = queries_lower[q]

        if querylen[q]==1:
            results[q] = alphabet[myDict[quer[0]]][0]
        else:
            matches = [item  for item in alphabet[myDict[quer[0]]] if quer==item[:querylen[q]]]
            if len(matches)>=1:
                results[q]= matches[0]

    print '\n'.join(map(str, results))


Before we go any further, something to keep in mind

I am not quite sure that this is an issue but in your example, your print lower-case usernames even though usernames with their original case is printed on this original problem.

Making your map from letters to username a bit cleaner

At the moment, you always use alphabet and myDict together. This is a good sign that something is wrong. (Also, if you really want to map letters to their index, you don't need dictionnary with hardcoded value, you just need arithmetic, ord/chr). If what you want is just to match letters with the list of corresponding usernames, you can use normal dictionnary and check if items are already present, normal dictionnary and fill with empty list from the very beginning or you could play it smart and use Python good stuff as defaultdict and setdefault :

from collections import defaultdict
mapping = defaultdict(list)
mapping2 = dict()
for username in usernames:
    low = username.lower()
    mapping[low[0]].append(low)
    mapping2.setdefault(low[0], []).append(low)


I'll assume we are using the defaultdict option from now.

In order to integrate this part of the solution in your code, let's remove the duplicated code from what we have in the part handling the queries :

for q in xrange(amount_of_queries):
    matches[:] = []
    quer = queries_lower[q]

    candidates = alphabet[myDict[quer[0]]] ### THIS IS WHAT IS HAVE ADDED
    if querylen[q]==1:
        results[q] = candidates[0]
    else:
        matches = [item  for item in candidates if quer==item[:querylen[q]]]
        if len(matches)>=1:
            results[q]= matches[0]


Replacing this with our defaultdict logic, we get :

def typeahead(usernames, queries):
    # Pre-processing on usernames
    mapping = defaultdict(list)
    for username in usernames:
        mapping[username.lower()[0]].append(username.lower())

    # Handling the queries
    matches=[]
    amount_of_queries = len(queries)
    results = [-1]*amount_of_queries
    queries_lower = [query.lower() for query in queries]
    querylen = [len(query) for query in queries]
    for q in xrange(amount_of_queries):
        matches[:] = []
        quer = queries_lower[q]

        candidates = mapping[quer[0]]
        if querylen[q]==1:
            results[q] = candidates[0]
        else:
            matches = [item  for item in candidates if quer==item[:querylen[q]]]
            if len(matches)>=1:
                results[q]= matches[0]

    print '\n'.join(map(str, results))


Please note that I took this chance to remove the list you were building with a useless list-comprehension.

Handling the queries

As you are about to iterate on queries anyway, there's no real need to build lists with information you will need. You can just iterate over queries and that will do the trick.

Also, you can initialise matches inside the loop and use enumerate if you really need the index. At this point, the part of the code looks like :

# Handling the queries
results = [-1]*len(queries)
for i,q in enumerate(queries):
    matches=[]
    low_quer = q.lower()
    l = len(low_quer)

    candidates = mapping[low_quer[0]]
    if l==1:
        results[i] = candidates[0]
    else:
        matches = [item  for item in candidates if low_quer==item[:l]]
        if len(matches)>=1:
            results[i]= matches[0]


Finally, we don't even need the indices as we can build results as we go. Also, we don't need matches in that part of the function. Things get even more concise :

```
# Handling the queries
r

Code Snippets

def typeahead(usernames, queries):
    # Pre-processing on usernames
    username_lower = [item.lower() for item in usernames]
    username_lower = sorted(username_lower)

    alphabet = [[] for x in xrange(26)]
    myDict = {'a':0,'b':1,'c':2,'d':3,'e':4,'f':5,'g':6,'h':7,'i':8,'j':9,'k':10,'l':11,'m':12,'n':13,'o':14,'p':15,'q':16,'r':17,'s':18,'t':19,'u':20,'v':21,'w':22,'x':23,'y':24,'z':25}

    [alphabet[myDict[username[0]]].append(username) for username in username_lower]

    # Handling the queries
    matches=[]
    amount_of_queries = len(queries)
    results = [-1]*amount_of_queries
    queries_lower = [query.lower() for query in queries]
    querylen = [len(query) for query in queries]
    for q in xrange(amount_of_queries):
        matches[:] = []
        quer = queries_lower[q]

        if querylen[q]==1:
            results[q] = alphabet[myDict[quer[0]]][0]
        else:
            matches = [item  for item in alphabet[myDict[quer[0]]] if quer==item[:querylen[q]]]
            if len(matches)>=1:
                results[q]= matches[0]

    print '\n'.join(map(str, results))
from collections import defaultdict
mapping = defaultdict(list)
mapping2 = dict()
for username in usernames:
    low = username.lower()
    mapping[low[0]].append(low)
    mapping2.setdefault(low[0], []).append(low)
for q in xrange(amount_of_queries):
    matches[:] = []
    quer = queries_lower[q]

    candidates = alphabet[myDict[quer[0]]] ### THIS IS WHAT IS HAVE ADDED
    if querylen[q]==1:
        results[q] = candidates[0]
    else:
        matches = [item  for item in candidates if quer==item[:querylen[q]]]
        if len(matches)>=1:
            results[q]= matches[0]
def typeahead(usernames, queries):
    # Pre-processing on usernames
    mapping = defaultdict(list)
    for username in usernames:
        mapping[username.lower()[0]].append(username.lower())

    # Handling the queries
    matches=[]
    amount_of_queries = len(queries)
    results = [-1]*amount_of_queries
    queries_lower = [query.lower() for query in queries]
    querylen = [len(query) for query in queries]
    for q in xrange(amount_of_queries):
        matches[:] = []
        quer = queries_lower[q]

        candidates = mapping[quer[0]]
        if querylen[q]==1:
            results[q] = candidates[0]
        else:
            matches = [item  for item in candidates if quer==item[:querylen[q]]]
            if len(matches)>=1:
                results[q]= matches[0]

    print '\n'.join(map(str, results))
# Handling the queries
results = [-1]*len(queries)
for i,q in enumerate(queries):
    matches=[]
    low_quer = q.lower()
    l = len(low_quer)

    candidates = mapping[low_quer[0]]
    if l==1:
        results[i] = candidates[0]
    else:
        matches = [item  for item in candidates if low_quer==item[:l]]
        if len(matches)>=1:
            results[i]= matches[0]

Context

StackExchange Code Review Q#54480, answer score: 4

Revisions (0)

No revisions yet.