patternpythonMinor
Find element that has been removed from a shuffled array
Viewed 0 times
shuffledremovedarraybeenhasthatelementfindfrom
Problem
For a function to determine which element has been removed from a shuffled array, I came up with:
In the solution for my course it says:
The naive solution is go through every element in the second array and
check whether it appears in the first array. Note that there may be
duplicate elements in the arrays so we should pay special attention to
it. The complexity of this approach is \$O(N^2)\$, since we would need
two for loops.
However doing the other way round, i.e. as I've done above seems to avoid a nested for loop, and is therefore linear as far as I can tell. Am I missing something? Have I made an error in my code or my analysis of its efficiency?
def finder(arr1, arr2):
l = len(arr1) - 1
result = list(arr1)
for i in range(l):
current = arr1[i]
if current in arr2:
result.remove(current)
arr2.remove(current)
return result[0]In the solution for my course it says:
The naive solution is go through every element in the second array and
check whether it appears in the first array. Note that there may be
duplicate elements in the arrays so we should pay special attention to
it. The complexity of this approach is \$O(N^2)\$, since we would need
two for loops.
However doing the other way round, i.e. as I've done above seems to avoid a nested for loop, and is therefore linear as far as I can tell. Am I missing something? Have I made an error in my code or my analysis of its efficiency?
Solution
When you don't know how long a built-in function call will take, then the TimeComplexity page on the Python wiki comes to the rescue. This tells us that both
So although
If you're having trouble with the analysis, there's no substitute for actually running the code and timing how long it takes on different inputs! This is really easy to do, for example, you might write:
and then:
You can see that when \$n\$ doubles, the runtime increases by around four times, which is what we expect for an \$Θ(n^2)\$ algorithm. Here's a plot showing a least-squares fit of the measurements collected above to the function \$t = an^2\$, and you can see that the fit is pretty good:
item in list and list.remove(item) take time proportional to the length of the list.So although
current in arr2 and result.remove(current) are single expressions in Python, in fact each of them has a loop hidden inside it, and this makes the overall runtime \$Θ(n^2)\$.If you're having trouble with the analysis, there's no substitute for actually running the code and timing how long it takes on different inputs! This is really easy to do, for example, you might write:
from random import randrange
from timeit import timeit
def test(n):
"""Time finder on lists of length n."""
a = list(range(n))
b = list(a)
del b[randrange(n)]
return timeit(lambda:finder(a, b), number=1)and then:
>>> for i in range(5):
... n = 10000 * 2 ** i
... print(n, test(n))
...
10000 0.020456696045584977
20000 0.07089142000768334
40000 0.361092661973089
80000 1.5187797680264339
160000 6.356738713919185You can see that when \$n\$ doubles, the runtime increases by around four times, which is what we expect for an \$Θ(n^2)\$ algorithm. Here's a plot showing a least-squares fit of the measurements collected above to the function \$t = an^2\$, and you can see that the fit is pretty good:
Code Snippets
from random import randrange
from timeit import timeit
def test(n):
"""Time finder on lists of length n."""
a = list(range(n))
b = list(a)
del b[randrange(n)]
return timeit(lambda:finder(a, b), number=1)>>> for i in range(5):
... n = 10000 * 2 ** i
... print(n, test(n))
...
10000 0.020456696045584977
20000 0.07089142000768334
40000 0.361092661973089
80000 1.5187797680264339
160000 6.356738713919185Context
StackExchange Code Review Q#152636, answer score: 7
Revisions (0)
No revisions yet.