HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Calculating the mean differences of subarrays

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thesubarraysmeandifferencescalculating

Problem

I'm trying to devise a way to calculate the mean differences (the absolute average differences between any two values in a set), of sub-arrays (starting from arbitrary indices) in an int array. I'll be placing the bounds of each subarray in to "buckets" of differing mean difference magnitudes.

The problem is, I can't find an efficient (better than \$O(n^3)\$) way of doing this. Would anyone mind helping me make the code more efficient (at least better than \$O(n^2)\$)?.

The calculations are performed server-side upon user request from my web-app. If making it more efficient isn't possible, would it be advisable to adopt a solution that doesn't involve modifying the computation, such as:

  • Keep this inefficient implementation and include a disclaimer stating it may take some time to complete



  • Transfer the calculations to several daemon threads which will


perform the process on the entire user base and store the results in
the database, which will be returned to the users (returned results
may not be up-to-date)

I'm currently at a crossroads and am not sure which route I should take.

Pseudocode of main function:

/*The purpose of the main function is to store the mean differences of  
sub-arrays in a given array, making the sure the stored sub-arrays are as 
long as possible*/

for i in dataArray
 loop through all j (j starting from i + 1) in dataArray 
    get mean difference of values in between indexes i and j
    get int value of mean difference (bucket)
    if bucket != (bucket of mean difference of values 
       in between indexes i and j - 1), store i and j in bucket


Main function:

```
public static HashMap>> calculateMeanDifferences(ArrayList dataArrayList)
{
//Each key maps to a stack which will hold HashMaps containing the bounding indices of subArrays which have mean differences that round to the key
HashMap>> meanDifferenceBucketHashMap = new HashMap>>();

long size = dataArrayList.size();

Integer previousMeanDifferenc

Solution

The first optimisation is to avoid using Double and Integer when you really want double and int You can use Trove4j to act as collection of these types. Additionally I wouldn't use double when you mean to use int, e.g. for counts which can only be whole numbers.

I would also avoid counting the combinations twice. e.g. if you have an array of two you compare both a1 - a2 and a2 - a1 which should be the same.

Can you make any assumptions on the range of int values?

This example

public static void main(String... args) throws IOException {
    Random rand = new Random(1);
    int size = 100000;
    TIntArrayList valuesArrayList = new TIntArrayList(size);
    for (int i = 0; i < size; i++) {
        valuesArrayList.add(rand.nextInt(size) - rand.nextInt(size));
    }
    long start = System.nanoTime();
    double d = calculateMeanDifference(valuesArrayList);
    long time = System.nanoTime() - start;
    System.out.printf("Took %.3f seconds to calculate d=%.3f for size=%,d values%n", time / 1e9, d, size);
}

public static double calculateMeanDifference(TIntArrayList valuesArrayList) {
    final TIntIntHashMap counts = new TIntIntHashMap();
    valuesArrayList.forEach(new TIntProcedure() {
        @Override
        public boolean execute(int value) {
            counts.adjustOrPutValue(value, 1, 1);
            return true;
        }
    });

    final int[] uniqueValues = new int[counts.size()], uniqueCounts = new int[counts.size()];
    counts.forEachEntry(new TIntIntProcedure() {
        int i = 0;
        @Override
        public boolean execute(int a, int b) {
            uniqueValues[i] = a;
            uniqueCounts[i++] = b;
            return true;
        }
    });
    long sum = 0;
    for (int i = 1; i < uniqueValues.length; i++) {
        int vi = uniqueValues[i];
        int ci = uniqueCounts[i];
        for (int j = 0; j < i; j++) {
            int counts2 = ci * uniqueCounts[j];
            sum += Math.abs(vi - uniqueValues[j]) * counts2;
        }
    }

    return 2.0 * sum / valuesArrayList.size() / (valuesArrayList.size() - 1);
}


prints

Took 12.173 seconds to calculate d=46786.434 for size=100,000 values

Code Snippets

public static void main(String... args) throws IOException {
    Random rand = new Random(1);
    int size = 100000;
    TIntArrayList valuesArrayList = new TIntArrayList(size);
    for (int i = 0; i < size; i++) {
        valuesArrayList.add(rand.nextInt(size) - rand.nextInt(size));
    }
    long start = System.nanoTime();
    double d = calculateMeanDifference(valuesArrayList);
    long time = System.nanoTime() - start;
    System.out.printf("Took %.3f seconds to calculate d=%.3f for size=%,d values%n", time / 1e9, d, size);
}

public static double calculateMeanDifference(TIntArrayList valuesArrayList) {
    final TIntIntHashMap counts = new TIntIntHashMap();
    valuesArrayList.forEach(new TIntProcedure() {
        @Override
        public boolean execute(int value) {
            counts.adjustOrPutValue(value, 1, 1);
            return true;
        }
    });

    final int[] uniqueValues = new int[counts.size()], uniqueCounts = new int[counts.size()];
    counts.forEachEntry(new TIntIntProcedure() {
        int i = 0;
        @Override
        public boolean execute(int a, int b) {
            uniqueValues[i] = a;
            uniqueCounts[i++] = b;
            return true;
        }
    });
    long sum = 0;
    for (int i = 1; i < uniqueValues.length; i++) {
        int vi = uniqueValues[i];
        int ci = uniqueCounts[i];
        for (int j = 0; j < i; j++) {
            int counts2 = ci * uniqueCounts[j];
            sum += Math.abs(vi - uniqueValues[j]) * counts2;
        }
    }

    return 2.0 * sum / valuesArrayList.size() / (valuesArrayList.size() - 1);
}
Took 12.173 seconds to calculate d=46786.434 for size=100,000 values

Context

StackExchange Code Review Q#8174, answer score: 7

Revisions (0)

No revisions yet.