HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Big-O Notation of Anagram solution algorithm

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
algorithmbiganagramnotationsolution

Problem

In Solution 1: Checking Off of Problem Solving with Algorithms and Data Structures, just beneath the ActiveCode: 1 extract (included at the bottom of this post for reference), it is stated:


each of the n characters in s1 will cause an iteration through up to n
characters in the list from s2.

...which to paraphrase (less succinctly), means "each character in s1 could result in all characters in s2 being inspected".

The following sentence states:


Each of the n positions in the list will be visited once to match a
character from s1.

Firstly I'm unsure which list is being referred, I think it means the variable alist, i.e. s2. Is this correct?


The number of visits then becomes the sum of the integers from 1 to n.

Secondly (and the real point of this post), if each item in the list (variable alist) is inspected for each character in s1, then why is this not simply O(n) = n^2 as per "nested loops"? If n=5, then then sum(n) = 15, where as n^2 = 25.

I'm fine with simplifying the equation for summing 1 to n integers to get n^2, but I'm confused about how sum(5) = 15, whereas O(n) = n^2.



ActiveCode: 1 Checking Off (active5)

def anagramSolution1(s1,s2):
    alist = list(s2)

    pos1 = 0
    stillOK = True

    while pos1 < len(s1) and stillOK:
        pos2 = 0
        found = False
        while pos2 < len(alist) and not found:
            if s1[pos1] == alist[pos2]:
                found = True
            else:
                pos2 = pos2 + 1

        if found:
            alist[pos2] = None
        else:
            stillOK = False

        pos1 = pos1 + 1

    return stillOK

print(anagramSolution1('abcd','dcba')


Edit to add

Having written this out it struck me that perhaps dropping the (1/2) and (1/2)n from the formula to sum(n) is the reason for the difference of actually summing n, as opposed to just doing n^2.

To be absolutely clear, I understand the formula for summing n is n^2 when dropping the non-domi

Solution

Let's be concrete. Suppose s1 = 'abcd' and s2 = 'dcba'.

For the a in s1, it will take 4 iterations to find the a in s2.
For the b in s1, it will take 3 iterations to find the b in s2.
For the c in s1, it will take 2 iterations to find the c in s2.
For the d in s1, it will take 1 iteration to find the d in s2.

So in this case, it does take sum(i) for i = 1,...,n iterations.

Now some quiet meditation should convince you than if s2 is an anagram of s1, it will again take exactly the same number of iterations; the summands will just appear in a different order. (After all, whatever is the first character in s2 will only require 1 iteration to be found, the second character requires 2 iterations, etc.)

If s2 is not an anagram of s1, then the algorithm will short-circuit when the first character not in s2 is tested. So then the algorithm will take less than O(n^2) time. But in the worst case, the algorithm takes O(n^2) time.

Context

StackExchange Computer Science Q#33618, answer score: 3

Revisions (0)

No revisions yet.