patternjavaMinor
Find elements occurring even number of times in an integer array
Viewed 0 times
numberelementsarrayoccurringtimesfindeveninteger
Problem
I came across an interview question, which is like this:
You have an array of integers, such that each integer is present an odd number of time, except 3 of them. Find the three numbers.
I tried implementing it using a
Here's my code:
What I don't like about this solution is the 2nd iteration. That would iterate over all the unique numbers in the array. Can there be any possible improvement here? May be a solution that avoids using a
You have an array of integers, such that each integer is present an odd number of time, except 3 of them. Find the three numbers.
I tried implementing it using a
HashMap, by mapping each integer element to a number that would identify if that number occurs even number of times. I used the XOR logic for that. 5 ^ 5 = 0, and 5 ^ anyNumberOtherThan5 != 0.Here's my code:
public static void findEvenOccurringNum(int[] arr) {
Map map = new HashMap();
for (int num: arr) {
if (map.containsKey(num)) {
map.put(num, map.get(num) ^ 1);
} else {
map.put(num, 1);
}
}
for (Entry entry: map.entrySet()) {
if (entry.getValue() == 0) {
System.out.println(entry.getKey());
}
}
}What I don't like about this solution is the 2nd iteration. That would iterate over all the unique numbers in the array. Can there be any possible improvement here? May be a solution that avoids using a
map? Or is it the best I can do?Solution
Wiki Answer - discussion of scalability.
After some investigation I thought it would be interesting to publish some of the performance results I have seen.... there are some interesting observations.
I have taken two algorithms, both on this page, the 200_Success one, and the rolfl one. I have run both algorithms through two types of tests, and for each type, I ran it through data of different sizes. I measured the actual performance.
The two types of test are 'data complexity' related. The first test I call 'sparse', and there are very few duplicates of data. The second set of data is 'freq' which has many, many duplicates. The two data sets are constructed as (
RLfreq,60.455,66.364,53.215,76.341,82.661,85.065,88.740,87.201,89.798,87.361
2Ssparse,157.926,184.521,214.407,242.943,256.200,286.686,352.467,438.048,420.164,666.043
2Sfreq,152.263,155.817,158.332,160.107,158.856,157.523,159.182,158.238,158.978,157.142
datasize,512,1024,2048,4096,8192,16384,32768,65536,131072,262144
iterations,8192,4096,2048,1024,512,256,128,64,32,16
RLfreq,42.496,41.446,53.732,58.305,66.281,65.889,64.953,66.078,73.901,71.291
RLmedium,122.813,179.644,218.106,253.908,272.583,259.016,232.671,226.121,220.121,207.180
2Ssparse,155.091,180.669,205.877,236.890,254.193,288.193,351.495,450.381,426.626,665.759
2Sfreq,172.426,173.460,173.988,174.462,175.049,174.477,175.384,172.826,174.230,175.976
2Smedium,162.969,210.496,232.372,235.070,229.555,222.412,219.141,219.358,218.502,221.412
RLfreq,50.688,45.955,61.013,59.733,74.000,68.274,69.208,70.487,76.905,74.185
RLmedium,116.118,177.327,223.468,258.029,275.395,263.365,234.839,225.775,221.798,210.071
2Ssparse,127.956,160.641,159.736,171.221,180.202,195.650,265.159,328.633,270.251,369.847
2Sfreq,154.762,154.865,157.780,157.265,158.705,159.562,160.342,161.276,160.924,163.296
2Smedium,128.366,152.183,187.613,198.995,270.550,198.450,200.526,202.244,205.382,207.368
datasize,512,1024,2048,4096,8192,16384,32768,65536,131072,262144
iterations,8192,4096,2048,1024,512,256,128,64,32,16
package dupcnt;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Random;
import java.util.Set;
public class CompareDups {
public static void main(String[] args) {
System.out.println("Building Data");
int[] sparse = new int[1>> dpoints - 1 - dp);
data[0][dp] = processRL(input, iterations[dp]);
data[3][dp] = process2S(input, iterations[dp]);
input = Arrays.copyOf(freq, freq.length >>> dpoints - 1 - dp);
data[1][dp] = processRL(input, iterations[dp]);
data[4][dp] = process2S(input, iterations[dp]);
input = Arrays.copyOf(medium, medium.length >>> dpoints - 1 - dp);
data[2][dp] = processRL(input, iterations[dp]);
data[5][dp] = process2S(input, iterations[dp]);
data[6][dp] = input.length;
data[7][dp] = iterations[dp];
}
String[] series = {"RLsparse", "RLfreq", "RLmedium", "2Ssparse", "2Sfreq", "2Smedium", "datasize", "iterations"};
for (int s = 0; s < series.length; s++) {
StringBuilder sb = new StringBuilder();
sb.append(seri
After some investigation I thought it would be interesting to publish some of the performance results I have seen.... there are some interesting observations.
I have taken two algorithms, both on this page, the 200_Success one, and the rolfl one. I have run both algorithms through two types of tests, and for each type, I ran it through data of different sizes. I measured the actual performance.
The two types of test are 'data complexity' related. The first test I call 'sparse', and there are very few duplicates of data. The second set of data is 'freq' which has many, many duplicates. The two data sets are constructed as (
1 RLsparse,130.632,164.163,225.850,271.158,289.436,322.981,343.293,360.966,390.827,412.959RLfreq,60.455,66.364,53.215,76.341,82.661,85.065,88.740,87.201,89.798,87.361
2Ssparse,157.926,184.521,214.407,242.943,256.200,286.686,352.467,438.048,420.164,666.043
2Sfreq,152.263,155.817,158.332,160.107,158.856,157.523,159.182,158.238,158.978,157.142
datasize,512,1024,2048,4096,8192,16384,32768,65536,131072,262144
iterations,8192,4096,2048,1024,512,256,128,64,32,16
Plotting this as a graph, using the datasize as the X axis, it looks like:
The 'dip' in the performance of the 2Ssparse 'curve' is consistent... there must be some anomoly that happens in the data of that size that 'wins' for whatever reason.
The 'obvious' observations are:
- both algorithms like lots of duplicate values.
- for high-duplicate values the 200_Success algorithm is practically
O(n) and it appears the HashMap is O(1) as stated.
- for sparse-duplicate values the 200_Success scales worse than the rolfl version, and, although it started faster, for larger data sets it quickly degrades to be slower.
- for sparse-duplicate values the rolfl version is faster (runs in 2/3 of the time), but is slightly degrading in performance.... I expect that for all reasonable data sizes it will be faster.....
EDIT:
After a request for 'medium' amounts of duplication (the random numbers are taken from a pool of 1024 instead of just 10):
RLsparse,124.211,147.412,216.475,273.934,293.235,330.671,346.458,372.003,399.730,407.985RLfreq,42.496,41.446,53.732,58.305,66.281,65.889,64.953,66.078,73.901,71.291
RLmedium,122.813,179.644,218.106,253.908,272.583,259.016,232.671,226.121,220.121,207.180
2Ssparse,155.091,180.669,205.877,236.890,254.193,288.193,351.495,450.381,426.626,665.759
2Sfreq,172.426,173.460,173.988,174.462,175.049,174.477,175.384,172.826,174.230,175.976
2Smedium,162.969,210.496,232.372,235.070,229.555,222.412,219.141,219.358,218.502,221.412
And the corresponding plot is:
EDIT2:
The HashSet based implementation has been revised to include hints on th size of the HashSets. The revised performance results are:
RLsparse,113.848,156.549,195.834,264.010,280.561,314.784,332.941,360.607,387.728,401.440RLfreq,50.688,45.955,61.013,59.733,74.000,68.274,69.208,70.487,76.905,74.185
RLmedium,116.118,177.327,223.468,258.029,275.395,263.365,234.839,225.775,221.798,210.071
2Ssparse,127.956,160.641,159.736,171.221,180.202,195.650,265.159,328.633,270.251,369.847
2Sfreq,154.762,154.865,157.780,157.265,158.705,159.562,160.342,161.276,160.924,163.296
2Smedium,128.366,152.183,187.613,198.995,270.550,198.450,200.526,202.244,205.382,207.368
datasize,512,1024,2048,4096,8192,16384,32768,65536,131072,262144
iterations,8192,4096,2048,1024,512,256,128,64,32,16
And the corresponding plot is (Note the scale is the same as other plots, max 700ms):
Here is the code I used to generate the above graph (the actual algorithms not included since they are above...).
This code is not neat, and please do not review it .... ;-) But feel free to make any changes you like to improve it... I know it is ugly... but it works.
``package dupcnt;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Random;
import java.util.Set;
public class CompareDups {
public static void main(String[] args) {
System.out.println("Building Data");
int[] sparse = new int[1>> dpoints - 1 - dp);
data[0][dp] = processRL(input, iterations[dp]);
data[3][dp] = process2S(input, iterations[dp]);
input = Arrays.copyOf(freq, freq.length >>> dpoints - 1 - dp);
data[1][dp] = processRL(input, iterations[dp]);
data[4][dp] = process2S(input, iterations[dp]);
input = Arrays.copyOf(medium, medium.length >>> dpoints - 1 - dp);
data[2][dp] = processRL(input, iterations[dp]);
data[5][dp] = process2S(input, iterations[dp]);
data[6][dp] = input.length;
data[7][dp] = iterations[dp];
}
String[] series = {"RLsparse", "RLfreq", "RLmedium", "2Ssparse", "2Sfreq", "2Smedium", "datasize", "iterations"};
for (int s = 0; s < series.length; s++) {
StringBuilder sb = new StringBuilder();
sb.append(seri
Code Snippets
int[] sparse = new int[1<<18];
Random rand = new Random(1); // repeatable 'random' numbers
for (int i = 0; i < sparse.length; i++) {
sparse[i] = rand.nextInt(sparse.length);
}
int[] freq = new int[1<<18];
for (int i = 0; i < freq.length; i++) {
freq[i] = rand.nextInt(10);
}iterations 256 128 64 32 16
data size 16384 32768 65536 131072 262144package dupcnt;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Random;
import java.util.Set;
public class CompareDups {
public static void main(String[] args) {
System.out.println("Building Data");
int[] sparse = new int[1<<18];
int[] freq = new int[sparse.length];
int[] medium = new int[sparse.length];
Random rand = new Random(1); // repeatable 'random' numbers
for (int i = 0; i < sparse.length; i++) {
sparse[i] = rand.nextInt(sparse.length);
}
for (int i = 0; i < freq.length; i++) {
freq[i] = rand.nextInt(10);
}
for (int i = 0; i < medium.length; i++) {
medium[i] = rand.nextInt(1024);
}
System.out.println("Built data " + sparse.length);
final int dpoints = 10;
int[] iterations = new int[dpoints];
for (int i = 0; i < iterations.length; i++) {
iterations[i] = 1 << dpoints + 3 - i;
}
for (int i = 0; i < 10; i++) {
long[][] data = new long[8][dpoints];
for (int dp = 0; dp < dpoints; dp++) {
int[] input = null;
input = Arrays.copyOf(sparse, sparse.length >>> dpoints - 1 - dp);
data[0][dp] = processRL(input, iterations[dp]);
data[3][dp] = process2S(input, iterations[dp]);
input = Arrays.copyOf(freq, freq.length >>> dpoints - 1 - dp);
data[1][dp] = processRL(input, iterations[dp]);
data[4][dp] = process2S(input, iterations[dp]);
input = Arrays.copyOf(medium, medium.length >>> dpoints - 1 - dp);
data[2][dp] = processRL(input, iterations[dp]);
data[5][dp] = process2S(input, iterations[dp]);
data[6][dp] = input.length;
data[7][dp] = iterations[dp];
}
String[] series = {"RLsparse", "RLfreq", "RLmedium", "2Ssparse", "2Sfreq", "2Smedium", "datasize", "iterations"};
for (int s = 0; s < series.length; s++) {
StringBuilder sb = new StringBuilder();
sb.append(series[s]);
for (long l : data[s]) {
if (s < 6) {
sb.append(",").append(String.format("%.3f", l/1000000.0));
} else {
sb.append(",").append(String.format("%d", l));
}
}
System.out.println(sb.toString());
}
}
}
private static long processRL(final int[] input, final int iterations) {
System.gc();
int cnt = 0;
long nanos = System.nanoTime();
for (int i = 0; i < iterations; i++) {
cnt += findDuplicates(input).length;
}
if (cnt < 0) {
throw new IllegalStateException("This is just to keep things from being optimized out.");
}
return System.nanoTime() Context
StackExchange Code Review Q#36842, answer score: 7
Revisions (0)
No revisions yet.