patterncsharpModerate
Finding the duplicate with an array with 1,000,000 integers between 1 and 1,000,000
Viewed 0 times
thewithduplicatearray000betweenfindingandintegers
Problem
I recently had an interview and got to phase 2 which is a coding assessment. One of the questions was:
Given an array with 1,000,000 integers between 1 and 1,000,000, one integer is in the array twice. Find the duplicate.
I failed the interview and was wondering what improvements do I need.
Test:
Given an array with 1,000,000 integers between 1 and 1,000,000, one integer is in the array twice. Find the duplicate.
I failed the interview and was wondering what improvements do I need.
///
/// Assuming that there is no space constraint,
/// I used a HashSet to find the duplicate.
/// If the duplicate is found, assign that number as the return value
///
///
/// 1,000,000 sized array containing ints between 1 - 1000000
/// The duplicate number
public static int findDuplicate(int[] givenArray)
{
// This hashset will be filled with non duplicated numbers.
HashSet nonDuplicateContainer = new HashSet();
int duplicate = -1;
for (int i = 0; i < givenArray.Length; i++)
{
if (!nonDuplicateContainer.Add(givenArray[i])) {
duplicate = givenArray[i];
break;
}
}
return duplicate;
}Test:
///
/// Testing with the correct duplicated number
///
[TestMethod()]
public void FindDuplicateTestCorrectGivenDuplicate()
{
int[] testArray = InitializeOneMillionSizedArray(106500);
int testDuplicateNumber = BlueWolfSolutions.findDuplicate(testArray);
Assert.AreEqual(106500, testDuplicateNumber);
}
///
/// Testing with the incorrect duplicated number
///
[TestMethod()]
public void FindDuplicateTestWrongGivenDuplicate()
{
int[] testArray = InitializeOneMillionSizedArray(453234);
int testDuplicateNumber = BlueWolfSolutions.findDuplicate(testArray);
Assert.AreNotEqual(25, testDuplicateNumber);
}Solution
There is already a good answer that describes how you could improve your tests, however, I would like to offer another solution. I had a question that was similar to this on an interview (well, I solved it using a similar method to the one I gave here).
When the interviewer asked this question, I think they wanted you to find a solution that is not only \$\mathcal{O}(n)\$ time but also \$\mathcal{O}(1)\$ space complexity. Based on the description you gave, there is one number that appears twice but there also must be one number that does not appear.
The sum of the first \$n\$ integers is \$\frac{n(n + 1)}{2}\$, and the sum of their squares is \$\frac{n(n + 1) (2n + 1)}{6} \$.
Let
Then there are numbers \$d\$ (double) and \$m\$ (missing) such that
$$sn' + m - d = sn$$
$$sn2' + m^2 - d^2 = sn2$$
Solving the first equation for \$m\$ and substituting into the second, we get
$$sn2' + (sn + d - sn')^2 - d^2 = sn2$$
Expand and collect like terms
$$sn2' + (sn - sn')^2 + 2 (sn - sn')d = sn2$$
Thus
$$d = \dfrac{(sn2 - sn2') - (sn - sn')^2}{2 (sn - sn')}$$
Here is a version of
Note that this method of finding extra or missing numbers in a list that is otherwise a range can be extended simply by adding equations for higher powers (cubes and beyond).
When the interviewer asked this question, I think they wanted you to find a solution that is not only \$\mathcal{O}(n)\$ time but also \$\mathcal{O}(1)\$ space complexity. Based on the description you gave, there is one number that appears twice but there also must be one number that does not appear.
The sum of the first \$n\$ integers is \$\frac{n(n + 1)}{2}\$, and the sum of their squares is \$\frac{n(n + 1) (2n + 1)}{6} \$.
Let
- \$sn\$ = 500000500000 be the sum of all of the numbers from 1 to 1000000
- \$sn2\$ = 333333833333500000 be the sum of their squares
- \$sn'\$ be the sum of the elements of the array
- \$sn2'\$ be the sum of their squares
Then there are numbers \$d\$ (double) and \$m\$ (missing) such that
$$sn' + m - d = sn$$
$$sn2' + m^2 - d^2 = sn2$$
Solving the first equation for \$m\$ and substituting into the second, we get
$$sn2' + (sn + d - sn')^2 - d^2 = sn2$$
Expand and collect like terms
$$sn2' + (sn - sn')^2 + 2 (sn - sn')d = sn2$$
Thus
$$d = \dfrac{(sn2 - sn2') - (sn - sn')^2}{2 (sn - sn')}$$
Here is a version of
findDuplicate that uses this rule (note that I have never used C# so there may be a couple of mistakes, and I did not put in comments since you've already read the explanation of how I got the formula):///
/// Find the duplicate number in a length n array where all elements are numbers
/// from 1 to n, all appearing once except for one duplicate (and consequently one
/// missing).
///
///
/// n sized array containing ints between 1 - n inclusive
/// The duplicate number
public static int findDuplicate(int[] givenArray)
{
ulong n = givenArray.Length, dsn, dsn2;
dsn = n*(n + 1)/2;
dsn2 = dsn*(2*n + 1)/3;
for(ulong i = 0; i < n; ++i){
dsn -= givenArray[i];
dsn2 -= givenArray[i]*givenArray[i];
}
return (dsn2 - dsn*dsn)/(2*dsn);
}Note that this method of finding extra or missing numbers in a list that is otherwise a range can be extended simply by adding equations for higher powers (cubes and beyond).
Code Snippets
/// <summary>
/// Find the duplicate number in a length n array where all elements are numbers
/// from 1 to n, all appearing once except for one duplicate (and consequently one
/// missing).
///
/// </summary>
/// <param name="givenArray">n sized array containing ints between 1 - n inclusive</param>
/// <returns>The duplicate number</returns>
public static int findDuplicate(int[] givenArray)
{
ulong n = givenArray.Length, dsn, dsn2;
dsn = n*(n + 1)/2;
dsn2 = dsn*(2*n + 1)/3;
for(ulong i = 0; i < n; ++i){
dsn -= givenArray[i];
dsn2 -= givenArray[i]*givenArray[i];
}
return (dsn2 - dsn*dsn)/(2*dsn);
}Context
StackExchange Code Review Q#133529, answer score: 15
Revisions (0)
No revisions yet.