HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Basic Java statistics class

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
javastatisticsclassbasic

Problem

I noticed there's some code on StackOverflow that is, to put it mildly, suboptimal. I'm posting here to see if we can provide a better solution.

Here's a proposal for fixing it.

The approach is to provide a small utility class with static methods, so it can be used just like Math.method(...).

The standard deviation implements this formula, which applies to a sample.

The median function is taken from this answer, which is based on the reasonable assumption that the array can be sorted in memory.

import java.util.Arrays;

public class Statistics {

    static public double getMean(double[] nums) {
        double total = 0;
        for (double value : nums) {
            total += value;
        }
        return total / nums.length;
    }

    // sample version
    static public double getVariance(double[] nums) {
        if (nums.length <= 1) {
            return 0;
        }

        double sum = 0;
        double mean = getMean(nums);

        for (double value : nums) {
            sum += Math.pow(value - mean, 2);
        }

        return sum / (nums.length - 1); // notice the -1
    }

    // sample version
    static public double getStdDev(double[] nums) {
        return Math.sqrt(getVariance(nums));
    }

    static public double getMedian(double[] nums) {
        Arrays.sort(nums);

        if (nums.length % 2 == 0) {
            return ((double) nums[nums.length / 2] + (double) nums[nums.length / 2 - 1]) / 2;
        } else {
            return (double) nums[nums.length / 2];
        }
    }
}


Is there something you would add?

Assuming this works well, it only works for double (and float) arrays.

It does not work for int arrays, nor does it work for Container classes, although all it would require to get it to work would be a copy-paste and some minimal changes.

Could Java 8 lambdas help make the code more generic and reduce the need for copy-paste?

Feel free to comment.

Solution

First of all, I object to the get… naming convention. "Get" implies that you are retrieving something that already exists (usually, though not always, paired with "set"). You wouldn't call Math.getCos(theta), would you?

The problems stem from accepting arrays in the first place. The class would be much more useful as an accumulator, like this:

Statistics stats = new Statistics();
stats.datum(3);
stats.datum(4);
stats.datum(5);
System.out.println(stats.mean());
System.out.println(stats.stdDev());


A simple way to calculate the mean, variance, and standard deviation is to keep running totals \$\sum x_i^0\$ (i.e., the count), \$\sum x_i^1\$ (i.e., the sum), and \$\sum x_i^2\$ (i.e., the sum of the squares). However, see Algorithms for calculating variance for a discussion of the merits of this method compared to others.

You could write

import java.util.OptionalDouble;

public class Statistics {
    private int sum0;
    private double sum1, sum2;

    public void datum(double x) {
        this.sum0++;
        this.sum1 += x;
        this.sum2 += x * x;
    }

    public int count() {
        return this.sum0;
    }

    public double sum() {
        return this.sum1;
    }

    public OptionalDouble mean() {
        if (this.count() == 0) {
            return OptionalDouble.empty();
        } else {
            return OptionalDouble.of(this.sum() / this.count());
        }
    }

    public OptionalDouble variance() {
        if (this.count() == 0) {
            return OptionalDouble.empty();
        } else {
            return OptionalDouble.of(
                   (this.sum2 - this.sum() * this.sum() / this.count())
                   / //////////////////////////////////////////////// /
                                     this.count()
            );
        }
    }

    public OptionalDouble stdDev() {
        if (this.count() == 0) {
            return OptionalDouble.empty();
        } else {
            return OptionalDouble.of(Math.sqrt(this.variance().getAsDouble()));
        }
    }
}


The median is trickier to calculate, as you would have to keep a list of all of the values.

Note that DoubleStream already gives you count(), sum(), and average().

Code Snippets

Statistics stats = new Statistics();
stats.datum(3);
stats.datum(4);
stats.datum(5);
System.out.println(stats.mean());
System.out.println(stats.stdDev());
import java.util.OptionalDouble;

public class Statistics {
    private int sum0;
    private double sum1, sum2;

    public void datum(double x) {
        this.sum0++;
        this.sum1 += x;
        this.sum2 += x * x;
    }

    public int count() {
        return this.sum0;
    }

    public double sum() {
        return this.sum1;
    }

    public OptionalDouble mean() {
        if (this.count() == 0) {
            return OptionalDouble.empty();
        } else {
            return OptionalDouble.of(this.sum() / this.count());
        }
    }

    public OptionalDouble variance() {
        if (this.count() == 0) {
            return OptionalDouble.empty();
        } else {
            return OptionalDouble.of(
                   (this.sum2 - this.sum() * this.sum() / this.count())
                   / //////////////////////////////////////////////// /
                                     this.count()
            );
        }
    }

    public OptionalDouble stdDev() {
        if (this.count() == 0) {
            return OptionalDouble.empty();
        } else {
            return OptionalDouble.of(Math.sqrt(this.variance().getAsDouble()));
        }
    }
}

Context

StackExchange Code Review Q#86882, answer score: 2

Revisions (0)

No revisions yet.