patternjavaMinor
Basic Java statistics class
Viewed 0 times
javastatisticsclassbasic
Problem
I noticed there's some code on StackOverflow that is, to put it mildly, suboptimal. I'm posting here to see if we can provide a better solution.
Here's a proposal for fixing it.
The approach is to provide a small utility class with static methods, so it can be used just like
The standard deviation implements this formula, which applies to a sample.
The median function is taken from this answer, which is based on the reasonable assumption that the array can be sorted in memory.
Is there something you would add?
Assuming this works well, it only works for double (and float) arrays.
It does not work for int arrays, nor does it work for
Could Java 8 lambdas help make the code more generic and reduce the need for copy-paste?
Feel free to comment.
Here's a proposal for fixing it.
The approach is to provide a small utility class with static methods, so it can be used just like
Math.method(...).The standard deviation implements this formula, which applies to a sample.
The median function is taken from this answer, which is based on the reasonable assumption that the array can be sorted in memory.
import java.util.Arrays;
public class Statistics {
static public double getMean(double[] nums) {
double total = 0;
for (double value : nums) {
total += value;
}
return total / nums.length;
}
// sample version
static public double getVariance(double[] nums) {
if (nums.length <= 1) {
return 0;
}
double sum = 0;
double mean = getMean(nums);
for (double value : nums) {
sum += Math.pow(value - mean, 2);
}
return sum / (nums.length - 1); // notice the -1
}
// sample version
static public double getStdDev(double[] nums) {
return Math.sqrt(getVariance(nums));
}
static public double getMedian(double[] nums) {
Arrays.sort(nums);
if (nums.length % 2 == 0) {
return ((double) nums[nums.length / 2] + (double) nums[nums.length / 2 - 1]) / 2;
} else {
return (double) nums[nums.length / 2];
}
}
}Is there something you would add?
Assuming this works well, it only works for double (and float) arrays.
It does not work for int arrays, nor does it work for
Container classes, although all it would require to get it to work would be a copy-paste and some minimal changes.Could Java 8 lambdas help make the code more generic and reduce the need for copy-paste?
Feel free to comment.
Solution
First of all, I object to the
The problems stem from accepting arrays in the first place. The class would be much more useful as an accumulator, like this:
A simple way to calculate the mean, variance, and standard deviation is to keep running totals \$\sum x_i^0\$ (i.e., the count), \$\sum x_i^1\$ (i.e., the sum), and \$\sum x_i^2\$ (i.e., the sum of the squares). However, see Algorithms for calculating variance for a discussion of the merits of this method compared to others.
You could write
The median is trickier to calculate, as you would have to keep a list of all of the values.
Note that
get… naming convention. "Get" implies that you are retrieving something that already exists (usually, though not always, paired with "set"). You wouldn't call Math.getCos(theta), would you?The problems stem from accepting arrays in the first place. The class would be much more useful as an accumulator, like this:
Statistics stats = new Statistics();
stats.datum(3);
stats.datum(4);
stats.datum(5);
System.out.println(stats.mean());
System.out.println(stats.stdDev());A simple way to calculate the mean, variance, and standard deviation is to keep running totals \$\sum x_i^0\$ (i.e., the count), \$\sum x_i^1\$ (i.e., the sum), and \$\sum x_i^2\$ (i.e., the sum of the squares). However, see Algorithms for calculating variance for a discussion of the merits of this method compared to others.
You could write
import java.util.OptionalDouble;
public class Statistics {
private int sum0;
private double sum1, sum2;
public void datum(double x) {
this.sum0++;
this.sum1 += x;
this.sum2 += x * x;
}
public int count() {
return this.sum0;
}
public double sum() {
return this.sum1;
}
public OptionalDouble mean() {
if (this.count() == 0) {
return OptionalDouble.empty();
} else {
return OptionalDouble.of(this.sum() / this.count());
}
}
public OptionalDouble variance() {
if (this.count() == 0) {
return OptionalDouble.empty();
} else {
return OptionalDouble.of(
(this.sum2 - this.sum() * this.sum() / this.count())
/ //////////////////////////////////////////////// /
this.count()
);
}
}
public OptionalDouble stdDev() {
if (this.count() == 0) {
return OptionalDouble.empty();
} else {
return OptionalDouble.of(Math.sqrt(this.variance().getAsDouble()));
}
}
}The median is trickier to calculate, as you would have to keep a list of all of the values.
Note that
DoubleStream already gives you count(), sum(), and average().Code Snippets
Statistics stats = new Statistics();
stats.datum(3);
stats.datum(4);
stats.datum(5);
System.out.println(stats.mean());
System.out.println(stats.stdDev());import java.util.OptionalDouble;
public class Statistics {
private int sum0;
private double sum1, sum2;
public void datum(double x) {
this.sum0++;
this.sum1 += x;
this.sum2 += x * x;
}
public int count() {
return this.sum0;
}
public double sum() {
return this.sum1;
}
public OptionalDouble mean() {
if (this.count() == 0) {
return OptionalDouble.empty();
} else {
return OptionalDouble.of(this.sum() / this.count());
}
}
public OptionalDouble variance() {
if (this.count() == 0) {
return OptionalDouble.empty();
} else {
return OptionalDouble.of(
(this.sum2 - this.sum() * this.sum() / this.count())
/ //////////////////////////////////////////////// /
this.count()
);
}
}
public OptionalDouble stdDev() {
if (this.count() == 0) {
return OptionalDouble.empty();
} else {
return OptionalDouble.of(Math.sqrt(this.variance().getAsDouble()));
}
}
}Context
StackExchange Code Review Q#86882, answer score: 2
Revisions (0)
No revisions yet.