HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Testing whether a given string has unique characters

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
uniquestringhastestingcharacterswhethergiven

Problem

Here is my code which tests whether a given string has unique characters. The function is_unique which does a linear search for each character in the string is \$O(n^2)\$ (if I am right). Is there any more efficient way of doing the same with more efficient DS that takes less time (or space)?

I know with some library functions it can be done more easily, such as len(set(strr)) == len(strr)

def is_unique(strr):
    """Without using any library functions"""
    def _contains(string, char):
        for c in string:
            if c == char: return True
        return False

    for index in range(len(strr)):
        if _contains(strr[:index], strr[index]): return False
    return True    

def main():
    for string in ['abc', 'abcc', 'aabc']:
        print is_unique(string)

if __name__ == '__main__':
    main()


Edit 1 (Adds constraints)

  • Don't change the original string and memory used should be \$O(1)\$ hence sorting the string may not be the valid solution.

Solution

You can do it in linear time via frequency counting. Make an array that is indexable by characters and initialise it with 0. Run over the string, incrementing the frequency for each character until that results in a value greater than 1 or the string is exhausted. Done.

EXTENDED EXPLANATION

Here is the algorithm expressed in pseudo code, which makes it easy to assess its complexity:

var seen: array [char] of boolean

seen = false

for each c in string_to_be_tested
   if seen[c]
      return false    // string is not free duplicates
   seen[c] = true

return true           // no character occurs more than once


That is the algorithm as such. CodeYogi has already posted a short and succinct pythonic approximation of it (first as a post scriptum, later it was moved into the question body):

len(set(strr)) == len(strr)


This compares the cardinality of the set of characters contained in the string to its length.

The set contains each character at most once, regardless how often it was added. This means that the cardinality of the set can only be equal to the length of the string if each of the characters occurred exactly once. This is not an algorithm as such, it is a functional expression whose algorithmic complexity is more difficult to assess than that of the simpler - if more verbose - imperative pseudo code.

When I wrote the first paragraph of the answer I was thinking along the lines of its most natural/efficient expression, which is in terms of the operation BTS ('bit test and set', or _bittestandset() as a C intrinsic). Here is a rendering of the algorithm in C, for the usual null-terminated strings:

// allocation and zeroing of the bitmap elided - too ugly and not relevant

bts(bitmap, 0); // pretend we've seen the terminator, so that we drop out at the end

while (!bts(bitmap, *s++))
;

return s[-1] == '\0'; // arrived at the end - unique, no duplicate characters


However, the BTS operation tends to be available only in assembly languages and C/C++; hence the description of the algorithm with an array of integer counts, which is much more widely available and which allows a compact approximation of the 'test and set' as 'increment and test':

if (++counts[string[i]] > 1)
return false


That array is a (partial) frequency count, of course. It is possible to do it exactly like this in python, but that would do neither the algorithm nor the language any justice; it would be ugly and inefficient.

Code Snippets

var seen: array [char] of boolean

seen = false

for each c in string_to_be_tested
   if seen[c]
      return false    // string is not free duplicates
   seen[c] = true

return true           // no character occurs more than once
len(set(strr)) == len(strr)

Context

StackExchange Code Review Q#70146, answer score: 4

Revisions (0)

No revisions yet.