patternjavaMinor
Simple and efficient data structure for standard deviation queries in Java
Viewed 0 times
simpledeviationefficientjavastandardstructureforandqueriesdata
Problem
I have this data structure to which the user can add number or remove them, both in constant time. Also, there is a \$O(1)\$ operation for querying the standard deviation of the current contents of the structure:
StandardDeviationNumberSet.java
```
package net.coderodde.stat;
import java.util.HashMap;
import java.util.Map;
public class StandardDeviationNumberSet {
private double sum;
private double squareSum;
private int numberOfElements;
private final Map countMap = new HashMap<>();
public void add(Number number) {
double numberValue = number.doubleValue();
Integer count = countMap.get(numberValue);
if (count != null) {
countMap.put(numberValue, count + 1);
} else {
countMap.put(numberValue, 1);
}
sum += numberValue;
squareSum += numberValue * numberValue;
numberOfElements++;
}
public double remove(Number number) {
double numberValue = number.doubleValue();
Integer count = countMap.get(numberValue);
if (count != null) {
if (count > 1) {
countMap.put(numberValue, count - 1);
} else {
countMap.remove(numberValue);
}
sum -= numberValue;
squareSum -= numberValue * numberValue;
numberOfElements--;
return numberValue;
} else {
return Double.NaN;
}
}
public double getStandardDeviation() {
checkSetHasAtleastTwoElements();
double step1 = squareSum - sum * sum / numberOfElements;
double step2 = step1 / (numberOfElements - 1);
return Math.sqrt(step2);
}
private void checkSetHasAtleastTwoElements() {
if (numberOfElements < 2) {
throw new IllegalStateException(
"Computing the standard deviation requires at least two " +
"elements.");
}
}
public static void main(St
StandardDeviationNumberSet.java
```
package net.coderodde.stat;
import java.util.HashMap;
import java.util.Map;
public class StandardDeviationNumberSet {
private double sum;
private double squareSum;
private int numberOfElements;
private final Map countMap = new HashMap<>();
public void add(Number number) {
double numberValue = number.doubleValue();
Integer count = countMap.get(numberValue);
if (count != null) {
countMap.put(numberValue, count + 1);
} else {
countMap.put(numberValue, 1);
}
sum += numberValue;
squareSum += numberValue * numberValue;
numberOfElements++;
}
public double remove(Number number) {
double numberValue = number.doubleValue();
Integer count = countMap.get(numberValue);
if (count != null) {
if (count > 1) {
countMap.put(numberValue, count - 1);
} else {
countMap.remove(numberValue);
}
sum -= numberValue;
squareSum -= numberValue * numberValue;
numberOfElements--;
return numberValue;
} else {
return Double.NaN;
}
}
public double getStandardDeviation() {
checkSetHasAtleastTwoElements();
double step1 = squareSum - sum * sum / numberOfElements;
double step2 = step1 / (numberOfElements - 1);
return Math.sqrt(step2);
}
private void checkSetHasAtleastTwoElements() {
if (numberOfElements < 2) {
throw new IllegalStateException(
"Computing the standard deviation requires at least two " +
"elements.");
}
}
public static void main(St
Solution
Overall I don't see any major problems with the code. There are some things I would change though:
Write unit tests.
Whenever I deal with mathematical classes, I think that it is obligatory to provide unit tests to verify correct computation and computational precision.
Add JavaDoc
Please add a JavaDoc to clarify whether it computes sample or population statistics.
Add comments
The use
Use
Since Java 8, which we have to assume is wide spread by this point, you can replace:
with
Consider handling
Currently your code will silently break if any of the inputs contain
Return type of
As far as I can tell
Change API to use
Currently your methods like
Further more calling
So I would change:
to:
and similarily for
I would prefer to use the primitive
Add methods for querying the count and mean as well
You have all the data to get the mean value and sample count as well and these are commonly used in conjunction with the standard deviation so to me it makes sense to provide these methods as well as they are trivial.
Edit/Addendum
I would consider leaving the validation of removed values to the user. By having the
Write unit tests.
Whenever I deal with mathematical classes, I think that it is obligatory to provide unit tests to verify correct computation and computational precision.
Add JavaDoc
Please add a JavaDoc to clarify whether it computes sample or population statistics.
Add comments
The use
numberValue instead of number in the look ups to countMap has a significant but subtle difference that is easy to miss if you're not an expert. I would add comments to clarify this (you are correctly avoiding Integer(3) and Double(3) getting different counts).Use
map.getOrDefaultSince Java 8, which we have to assume is wide spread by this point, you can replace:
Integer count = countMap.get(numberValue);
if (count != null) {
countMap.put(numberValue, count + 1);
} else {
countMap.put(numberValue, 1);
}with
countMap.put(numberValue, countMap.getOrDefault(numberValue, 0) + 1);Consider handling
NaN and Inf inputsCurrently your code will silently break if any of the inputs contain
NaN or Infinity. I would consider testing for these inputs and throwing if they are encountered. Return type of
remove is weirdAs far as I can tell
remove will return the input value or NaN if the input value was not removed. To me this feels very strange, I would much rather it'd return true if the statistics changed and false otherwise.Change API to use
double instead of NumberCurrently your methods like
public void add(Number number) first unbox the number into a double then rebox it into a Double. So calling add with a Double causes unnecessary boxing and unboxing. Further more calling
add with any of the primitive types (int,long,float etc) causes first a boxing to Number, then an unboxing to double then a re-boxing to Double (to use in countMap.get). This is a lot of unnecessary boxing and unboxing and will also generate a compiler warning on many systems.So I would change:
public void add(Number number) {
double numberValue = number.doubleValue();
Integer count = countMap.get(numberValue);
if (count != null) {
countMap.put(numberValue, count + 1);
} else {
countMap.put(numberValue, 1);
}
sum += numberValue;
squareSum += numberValue * numberValue;
numberOfElements++;
}to:
public void add(double number) {
// JIT will use CSE to consolidate the auto boxings of number.
countMap.put(number, countMap.getOrDefault(number, 0) + 1);
sum += number;
squareSum += number* number;
numberOfElements++;
}
public void add(Number number){
add(number.doubleValue());
}and similarily for
remove.I would prefer to use the primitive
double in the API because more often than not the user will do some computations to end up with the input value and will thus have a primitive double that they want to pass in. Other primitive types will automatically be type promoted to double. In the odd case that the user actually has a Number of some sort, we add an overload to handle it. Add methods for querying the count and mean as well
You have all the data to get the mean value and sample count as well and these are commonly used in conjunction with the standard deviation so to me it makes sense to provide these methods as well as they are trivial.
Edit/Addendum
I would consider leaving the validation of removed values to the user. By having the
countMap you add \$O\left(n\right)\$ memory requirement even if the user doesn't need the remove ability. And in many cases where you do need a remove (like a sliding window) the removal is guaranteed to be valid by the definition of the window.Code Snippets
Integer count = countMap.get(numberValue);
if (count != null) {
countMap.put(numberValue, count + 1);
} else {
countMap.put(numberValue, 1);
}countMap.put(numberValue, countMap.getOrDefault(numberValue, 0) + 1);public void add(Number number) {
double numberValue = number.doubleValue();
Integer count = countMap.get(numberValue);
if (count != null) {
countMap.put(numberValue, count + 1);
} else {
countMap.put(numberValue, 1);
}
sum += numberValue;
squareSum += numberValue * numberValue;
numberOfElements++;
}public void add(double number) {
// JIT will use CSE to consolidate the auto boxings of number.
countMap.put(number, countMap.getOrDefault(number, 0) + 1);
sum += number;
squareSum += number* number;
numberOfElements++;
}
public void add(Number number){
add(number.doubleValue());
}Context
StackExchange Code Review Q#158917, answer score: 4
Revisions (0)
No revisions yet.