HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Find element that has been removed from a shuffled array

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
shuffledremovedarraybeenhasthatelementfindfrom

Problem

For a function to determine which element has been removed from a shuffled array, I came up with:

def finder(arr1, arr2):
  l = len(arr1) - 1 
  result = list(arr1)
  for i in range(l):
    current = arr1[i]
    if current in arr2:
      result.remove(current)
      arr2.remove(current)
  return result[0]


In the solution for my course it says:


The naive solution is go through every element in the second array and
check whether it appears in the first array. Note that there may be
duplicate elements in the arrays so we should pay special attention to
it. The complexity of this approach is \$O(N^2)\$, since we would need
two for loops.

However doing the other way round, i.e. as I've done above seems to avoid a nested for loop, and is therefore linear as far as I can tell. Am I missing something? Have I made an error in my code or my analysis of its efficiency?

Solution

When you don't know how long a built-in function call will take, then the TimeComplexity page on the Python wiki comes to the rescue. This tells us that both item in list and list.remove(item) take time proportional to the length of the list.

So although current in arr2 and result.remove(current) are single expressions in Python, in fact each of them has a loop hidden inside it, and this makes the overall runtime \$Θ(n^2)\$.

If you're having trouble with the analysis, there's no substitute for actually running the code and timing how long it takes on different inputs! This is really easy to do, for example, you might write:

from random import randrange
from timeit import timeit

def test(n):
    """Time finder on lists of length n."""
    a = list(range(n))
    b = list(a)
    del b[randrange(n)]
    return timeit(lambda:finder(a, b), number=1)


and then:

>>> for i in range(5):
...     n = 10000 * 2 ** i
...     print(n, test(n))
... 
10000 0.020456696045584977
20000 0.07089142000768334
40000 0.361092661973089
80000 1.5187797680264339
160000 6.356738713919185


You can see that when \$n\$ doubles, the runtime increases by around four times, which is what we expect for an \$Θ(n^2)\$ algorithm. Here's a plot showing a least-squares fit of the measurements collected above to the function \$t = an^2\$, and you can see that the fit is pretty good:

Code Snippets

from random import randrange
from timeit import timeit

def test(n):
    """Time finder on lists of length n."""
    a = list(range(n))
    b = list(a)
    del b[randrange(n)]
    return timeit(lambda:finder(a, b), number=1)
>>> for i in range(5):
...     n = 10000 * 2 ** i
...     print(n, test(n))
... 
10000 0.020456696045584977
20000 0.07089142000768334
40000 0.361092661973089
80000 1.5187797680264339
160000 6.356738713919185

Context

StackExchange Code Review Q#152636, answer score: 7

Revisions (0)

No revisions yet.