snippetMajor
One element that differs in two arrays. How to find it efficiently?
Viewed 0 times
findarraysefficientlyonetwothatelementhowdiffers
Problem
I am preparing for a coding interview and I can't really figure out the most efficient way to solve this problem.
Let's say we have two arrays consisting of numbers that are unsorted. Array 2 contains a number that Array 1 does not. Both arrays have randomly located numbers, not necessarily in the same order or at the same indices. For example:
Array 1
[78,11, 143, 84, 77, 1, 26, 35 .... n]
Array 2
[11,84, 35, 25, 77, 78, 26, 143 ... 21... n+1]
What is the fastest algorithm for finding the number that differs? What is its running time? In this example, the number we would be looking for is 21.
My idea was to run through Array 1 and delete that value from array 2. Iterate until you are finished. This should be around $O(n \log n)$ running time, right?
Let's say we have two arrays consisting of numbers that are unsorted. Array 2 contains a number that Array 1 does not. Both arrays have randomly located numbers, not necessarily in the same order or at the same indices. For example:
Array 1
[78,11, 143, 84, 77, 1, 26, 35 .... n]
Array 2
[11,84, 35, 25, 77, 78, 26, 143 ... 21... n+1]
What is the fastest algorithm for finding the number that differs? What is its running time? In this example, the number we would be looking for is 21.
My idea was to run through Array 1 and delete that value from array 2. Iterate until you are finished. This should be around $O(n \log n)$ running time, right?
Solution
I see four main ways to solve this problem, with different running times:
-
$O(n^2)$ solution: this would be the solution that you propose. Note that, since the arrays are unsorted, deletion takes linear time. You carry out $n$ deletions; therefore, this algorithm takes quadratic time.
-
$O(n \: log \: n)$ solution: sort the arrays beforehand; then, perform a linear search to identify the distinct element. In this solution, the running time is dominated by the sorting operation, hence the $O(n \: log \: n)$ upper bound.
When you identify a solution to a problem, you should always ask yourself: can I do better? In this case, you can, making a clever use of data structures. Note that all you need to do is to iterate one array and perform repeated lookups in the other array. What data structure allows you to do lookups in (expected) constant time? You guessed right: a hash table.
If you want upper-bound guarantees and the arrays are strictly composed of integers, the best solution is, probably, the one suggested by Tobi Alafin (even though this solution will not give you the index of the element that differs in the second array):
Finally, another possibility (under the same assumption of integer arrays) would be to use a linear-time sorting algortihm such as counting sort. This would reduce the running time of the sorting-based solution from $O(n \: log \: n)$ to $O(n)$.
-
$O(n^2)$ solution: this would be the solution that you propose. Note that, since the arrays are unsorted, deletion takes linear time. You carry out $n$ deletions; therefore, this algorithm takes quadratic time.
-
$O(n \: log \: n)$ solution: sort the arrays beforehand; then, perform a linear search to identify the distinct element. In this solution, the running time is dominated by the sorting operation, hence the $O(n \: log \: n)$ upper bound.
When you identify a solution to a problem, you should always ask yourself: can I do better? In this case, you can, making a clever use of data structures. Note that all you need to do is to iterate one array and perform repeated lookups in the other array. What data structure allows you to do lookups in (expected) constant time? You guessed right: a hash table.
- $O(n)$ solution (expected): iterate the first array and store the elements in a hash table; then, perform a linear scan in the second array, looking up each element in the hash table. Return the element that is not found in the hash table. This linear-time solution works for any type of element that you can pass to a hash function (e.g., it would work similarly for arrays of strings).
If you want upper-bound guarantees and the arrays are strictly composed of integers, the best solution is, probably, the one suggested by Tobi Alafin (even though this solution will not give you the index of the element that differs in the second array):
- $O(n)$ solution (guaranteed): sum up the elements of the first array. Then, sum up the elements of the second array. Finally, perform the substraction. Note that this solution can actually be generalized to any data type whose values can be represented as fixed-length bit strings, thanks to the bitwise XOR operator. This is thoroughly explained in Ilmari Karonen's answer.
Finally, another possibility (under the same assumption of integer arrays) would be to use a linear-time sorting algortihm such as counting sort. This would reduce the running time of the sorting-based solution from $O(n \: log \: n)$ to $O(n)$.
Context
StackExchange Computer Science Q#67499, answer score: 31
Revisions (0)
No revisions yet.