HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Determine if string has all unique characters

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
uniqueallhasdeterminecharactersstring

Problem

I'm working my way through the exercises of the book Cracking the Coding Interview. I'd like to review my solution for the question:


Implement an algorithm to determine of a string has all unique characters. What if you cannot use additional data structures ?

I'm aware that this question has been asked before, this was however for the languages Java and Objective-C. This time it's a very extended version in Python.

What I am looking for in this review:

  • This is my first time I'm coding algorithms with the aim for interviews in python. Do you notice any bad habits/things I might have to focus on (related to Python)? Should one create a main() method or is it good practice to code your interview as a unit test?



  • I've listed multiple approaches and tried to answer the space and time complexities. However is it true that the list wise (and the optimised bitwise) solutions are the best or are there better (pythonic/non-pythonic) solutions? I've read that the unique_characters_naive_set_short approach is often not really considered as a 'algorithm' and that it's hard to deduct the time/space complexities from it.



```
"""
TASK: (1) determine if string has all unique characters,
(2) what if you cannot use additional datastructures?
"""
import unittest

"""
First we should consider if the string is encoded in ASCII, unicode or any other encoding scheme
We consider it being ASCII, consist of 128 characters
=> 0-31 are control characters, 32-127 are considered 'characters' => max 96 unique characters.
"""

# OPTION 0: Naive approach: time O(n^2)
def unique_characters_naive_enum(input_string):
if len(input_string)>96:
return False
for idx, char in enumerate(input_string):
for idx2 in xrange(idx+1,len(input_string)):
if char == input_string[idx2]:
return False
return True

# OPTION 1: Naive approach with set: time O(n^2)
def unique_characters_naive_set(input_string):
if len(input_strin

Solution

-
Unless your interview is of the "here's your exercise, e-mail us your working code within a couple of hours", you are either going to be writing on a whiteboard, or on a shared google doc, with no internet access or reference material to check. The way I see it, this means you want to be very defensive in your coding style, and avoid using the less usual modules or features of the language, unless you are extremely comfortable doing so. Long story short, I wouldn't use unittest in an interview. If a bunch of assert statements thrown in a test function are good enough for Peter Norvig, they should also be enough for any interview.

What you can certainly improve on is the structure of those tests, e.g. put together something like:

def test():
    functions = [unique_characters_naive_enum,
                 unique_characters_naive_set,
                 unique_characters_naive_set_short,
                 unique_characters_sorted,
                 unique_characters_list,
                 unique_characters_bitwise,
                 ]
    test_cases = [('hello', False), ('azerty', True)]

    for function in functions:
        for arg, ret in test_cases:
            assert function(arg) == ret


-
If you are assuming things in your input, it is a very good idea to document those, rather than in a comment, with an assert statement. As an example, I would rewrite your unique_characters_bitwise function as:

def unique_characters_bitwise(input_string):
    assert all('a'  ord('z') - ord('a') + 1:
        return False
    # bitmap of seen characters
    seen_chars = 0
    for char in input_string:
        char_bitmask = 1 << (ord(char) - ord('a'))
        if seen_chars & char_bitmask:
            return False
        seen_chars |= char_bitmask
    return True


Make sure you stress that you are not validating the input, you shouldn't use assert for that, but documenting the function's specification.

-
I have also removed magic numbers from the above function: it is much clearer what 26 stands for if you make it explicit that it is the number of letters in your alphabet.

  • Try to stick to the conventions of the language, which for Python means PEP8: read it, learn it, love it. Especially if you end up coding on a whiteboard or a google doc, keeping your max. line width short is a great habit to have. And if you happen to be interviewed by a hardcore Python geek, it is probably good to avoid unnecessary parenthesis or missing spaces!



  • While it is true that the worst case performance of look-up and insertion into hash tables, i.e. set, is O(n), it is also true that they take constant time on average. The only practical drawbacks they have are malicious opponents (but this doesn't apply if your keys are single char ASCII strings), and the fact that dynamically resized hash tables will have the odd operation that does take linear time, so you probably don't want that data structure used in a blocking call in a pacemaker's firmware! But in many, many typical interview situations, the right answer to "can we do better?" is a hash table: for all practical purposes, your single-liner unique_characters_naive_set_short takes linear time and space, and you should make sure you convey that to the interviewer before trying anything more complicated.

Code Snippets

def test():
    functions = [unique_characters_naive_enum,
                 unique_characters_naive_set,
                 unique_characters_naive_set_short,
                 unique_characters_sorted,
                 unique_characters_list,
                 unique_characters_bitwise,
                 ]
    test_cases = [('hello', False), ('azerty', True)]

    for function in functions:
        for arg, ret in test_cases:
            assert function(arg) == ret
def unique_characters_bitwise(input_string):
    assert all('a' <= char <= 'z' for char in input_string)
    if len(input_string) > ord('z') - ord('a') + 1:
        return False
    # bitmap of seen characters
    seen_chars = 0
    for char in input_string:
        char_bitmask = 1 << (ord(char) - ord('a'))
        if seen_chars & char_bitmask:
            return False
        seen_chars |= char_bitmask
    return True

Context

StackExchange Code Review Q#96630, answer score: 6

Revisions (0)

No revisions yet.