patternjavaMinor
Calculating the mean differences of subarrays
Viewed 0 times
thesubarraysmeandifferencescalculating
Problem
I'm trying to devise a way to calculate the mean differences (the absolute average differences between any two values in a set), of sub-arrays (starting from arbitrary indices) in an
The problem is, I can't find an efficient (better than \$O(n^3)\$) way of doing this. Would anyone mind helping me make the code more efficient (at least better than \$O(n^2)\$)?.
The calculations are performed server-side upon user request from my web-app. If making it more efficient isn't possible, would it be advisable to adopt a solution that doesn't involve modifying the computation, such as:
perform the process on the entire user base and store the results in
the database, which will be returned to the users (returned results
may not be up-to-date)
I'm currently at a crossroads and am not sure which route I should take.
Pseudocode of main function:
Main function:
```
public static HashMap>> calculateMeanDifferences(ArrayList dataArrayList)
{
//Each key maps to a stack which will hold HashMaps containing the bounding indices of subArrays which have mean differences that round to the key
HashMap>> meanDifferenceBucketHashMap = new HashMap>>();
long size = dataArrayList.size();
Integer previousMeanDifferenc
int array. I'll be placing the bounds of each subarray in to "buckets" of differing mean difference magnitudes.The problem is, I can't find an efficient (better than \$O(n^3)\$) way of doing this. Would anyone mind helping me make the code more efficient (at least better than \$O(n^2)\$)?.
The calculations are performed server-side upon user request from my web-app. If making it more efficient isn't possible, would it be advisable to adopt a solution that doesn't involve modifying the computation, such as:
- Keep this inefficient implementation and include a disclaimer stating it may take some time to complete
- Transfer the calculations to several daemon threads which will
perform the process on the entire user base and store the results in
the database, which will be returned to the users (returned results
may not be up-to-date)
I'm currently at a crossroads and am not sure which route I should take.
Pseudocode of main function:
/*The purpose of the main function is to store the mean differences of
sub-arrays in a given array, making the sure the stored sub-arrays are as
long as possible*/
for i in dataArray
loop through all j (j starting from i + 1) in dataArray
get mean difference of values in between indexes i and j
get int value of mean difference (bucket)
if bucket != (bucket of mean difference of values
in between indexes i and j - 1), store i and j in bucketMain function:
```
public static HashMap>> calculateMeanDifferences(ArrayList dataArrayList)
{
//Each key maps to a stack which will hold HashMaps containing the bounding indices of subArrays which have mean differences that round to the key
HashMap>> meanDifferenceBucketHashMap = new HashMap>>();
long size = dataArrayList.size();
Integer previousMeanDifferenc
Solution
The first optimisation is to avoid using
I would also avoid counting the combinations twice. e.g. if you have an array of two you compare both a1 - a2 and a2 - a1 which should be the same.
Can you make any assumptions on the range of int values?
This example
prints
Double and Integer when you really want double and int You can use Trove4j to act as collection of these types. Additionally I wouldn't use double when you mean to use int, e.g. for counts which can only be whole numbers.I would also avoid counting the combinations twice. e.g. if you have an array of two you compare both a1 - a2 and a2 - a1 which should be the same.
Can you make any assumptions on the range of int values?
This example
public static void main(String... args) throws IOException {
Random rand = new Random(1);
int size = 100000;
TIntArrayList valuesArrayList = new TIntArrayList(size);
for (int i = 0; i < size; i++) {
valuesArrayList.add(rand.nextInt(size) - rand.nextInt(size));
}
long start = System.nanoTime();
double d = calculateMeanDifference(valuesArrayList);
long time = System.nanoTime() - start;
System.out.printf("Took %.3f seconds to calculate d=%.3f for size=%,d values%n", time / 1e9, d, size);
}
public static double calculateMeanDifference(TIntArrayList valuesArrayList) {
final TIntIntHashMap counts = new TIntIntHashMap();
valuesArrayList.forEach(new TIntProcedure() {
@Override
public boolean execute(int value) {
counts.adjustOrPutValue(value, 1, 1);
return true;
}
});
final int[] uniqueValues = new int[counts.size()], uniqueCounts = new int[counts.size()];
counts.forEachEntry(new TIntIntProcedure() {
int i = 0;
@Override
public boolean execute(int a, int b) {
uniqueValues[i] = a;
uniqueCounts[i++] = b;
return true;
}
});
long sum = 0;
for (int i = 1; i < uniqueValues.length; i++) {
int vi = uniqueValues[i];
int ci = uniqueCounts[i];
for (int j = 0; j < i; j++) {
int counts2 = ci * uniqueCounts[j];
sum += Math.abs(vi - uniqueValues[j]) * counts2;
}
}
return 2.0 * sum / valuesArrayList.size() / (valuesArrayList.size() - 1);
}prints
Took 12.173 seconds to calculate d=46786.434 for size=100,000 valuesCode Snippets
public static void main(String... args) throws IOException {
Random rand = new Random(1);
int size = 100000;
TIntArrayList valuesArrayList = new TIntArrayList(size);
for (int i = 0; i < size; i++) {
valuesArrayList.add(rand.nextInt(size) - rand.nextInt(size));
}
long start = System.nanoTime();
double d = calculateMeanDifference(valuesArrayList);
long time = System.nanoTime() - start;
System.out.printf("Took %.3f seconds to calculate d=%.3f for size=%,d values%n", time / 1e9, d, size);
}
public static double calculateMeanDifference(TIntArrayList valuesArrayList) {
final TIntIntHashMap counts = new TIntIntHashMap();
valuesArrayList.forEach(new TIntProcedure() {
@Override
public boolean execute(int value) {
counts.adjustOrPutValue(value, 1, 1);
return true;
}
});
final int[] uniqueValues = new int[counts.size()], uniqueCounts = new int[counts.size()];
counts.forEachEntry(new TIntIntProcedure() {
int i = 0;
@Override
public boolean execute(int a, int b) {
uniqueValues[i] = a;
uniqueCounts[i++] = b;
return true;
}
});
long sum = 0;
for (int i = 1; i < uniqueValues.length; i++) {
int vi = uniqueValues[i];
int ci = uniqueCounts[i];
for (int j = 0; j < i; j++) {
int counts2 = ci * uniqueCounts[j];
sum += Math.abs(vi - uniqueValues[j]) * counts2;
}
}
return 2.0 * sum / valuesArrayList.size() / (valuesArrayList.size() - 1);
}Took 12.173 seconds to calculate d=46786.434 for size=100,000 valuesContext
StackExchange Code Review Q#8174, answer score: 7
Revisions (0)
No revisions yet.