HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Class Design: Calculating statistics from weighted values

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
weighteddesigncalculatingstatisticsvaluesfromclass

Problem

I was wondering if this is the best possible design for my situation. Note that this is a simpler version of the problem I am trying to tackle.

I have a class Stage that holds data as two vectors: weights w_ and values x_. I'm interested in calculating certain statistics like the weighted mean / variance etc. So I have a method Stage::ReportStatistic.

The Statistic class is abstract and goes as a pointer argument to Stage::ReportStatistic. The Statistic class has a method Statistic::Value that takes two vectors and calculates the statistic in any particular implementation, such as my StatisticMean::Value.

I find that my Statistic::Value method needs references to private members of the Stage class. Is there a way to avoid the signature Statistic::Value(std::vector const &, std::vector const &)? If I changed the representation of data in stage to std::vector > instead, it would break Statistic::Value. Is it asking for too much to come up with a design that could avoid this problem?

The code is complete and can be compiled using something like g++ -std=c++11 .cpp.

```
#include
#include
#include
#include
#include
#include

// BEGIN statistic.h
namespace project
{
// Abstract class that takes in data and spits out a statistic through its Value() method
class Statistic
{
public:
virtual double Value(std::vector const &, std::vector const &) = 0;
};
}
// END statistic.h

//BEGIN statisticmean.h
namespace project
{
// An implementation of statistic whose Value() method computes the weighted mean of the data
class StatisticMean: public Statistic
{
public:
class StatisticMeanBuilder;
StatisticMean() {}
double Value(std::vector const &x, std::vector const &w);
};

class StatisticMean::StatisticMeanBuilder
{
public:
StatisticMeanBuilder() {}
std::unique_ptr Build()
{
std::un

Solution

Hmmm....let's consider what the standard library provides to do the same job you've done. Your weighted mean ends up being the same algorithm as what the standard library calls std::inner_product. Using the standard library roughly as (I think) it was intended to be used, we could end up with code something like this:

#define do_throw(f, l, s) throw std::logic_error("Error in " #f " at line " #l ": " s)
#define thrower(f, l, s) do_throw(f, l, s)

#define assure(b, s) ((b) || (thrower(__FILE__, __LINE__, s), 1))

int main()
{
    static const int N = 10000;

    std::random_device rd;
    std::mt19937_64 g(rd());

    std::uniform_real_distribution d(0.0, 1.0);
    std::normal_distribution du(0.0, 1.0);

    std::vector weights;
    std::vector values;

    try {
        std::generate_n(std::back_inserter(weights), N, [&] { return d(g); });
        std::generate_n(std::back_inserter(values), N, [&] { return du(g); });

        assure(weights.size() != 0, "Size == 0");
        assure(weights.size() == values.size(), "Size mismatch");

        std::cout << inner_product(weights.begin(), weights.end(),
            values.begin(), 0.0);
    }
    catch (std::exception &e) {
        std::cerr << e.what() << "\n";
        return 0;
    }
}


It doesn't seem to me that what you've invented adds enough new to that to justify itself.

If I were going to write something different from that, I'd probably start from the data. I think the idea of having a vector of pairs is a reasonable enough one that it's worth at least considering what code using that could look like. I'd think of something like this:

int main()
{
    static const int N = 3;

    std::random_device rd;
    std::mt19937_64 g(rd());

    std::uniform_real_distribution d(0.0, 1.0);
    std::normal_distribution du(0.0, 1.0);

    std::vector> vals;

    try {
        std::generate_n(std::back_inserter(vals), 
                        N, 
                        [&] { return std::make_pair(d(g), du(g)); });

        std::cout  const &b) { 
                return a + b.first * b.second; 
            });

    }
    catch (std::exception &e) {
        std::cerr << e.what() << "\n";
        return 0;
    }
}


The usual hope with an inheritance-based hierarchy is that it allows substantially improved flexibility and/or ease of implementation, at the expense of only minimal run-time overhead.

In this case, you seem to have achieved the minimal run-time overhead (only one virtual call per entire pair of arrays), but I don't see where you've accomplished much (if anything) in the way of adding flexibility or made implementation any easier at all (rather the opposite: you've actually added some overhead to implementation as well).

At least as I see it, the problem here is fairly simple: inheritance almost always adds a little bit of overhead:

  • You (obviously) have to inherit from the right base class.



  • You have to manipulate things via pointers.



In this case, the sum total of "knowledge" embodied in the base class is "we doing something with a couple of arrays". That doesn't impart enough useful information to the derived class that it simplifies implementation enough to make up for the normal overhead of using derivation to start with, so it ends up as a net loss. Worse, although we probably could embody a little more into the base class, I'm pretty sure even when we move as much there as we can, it still ends up a net loss.

Code Snippets

#define do_throw(f, l, s) throw std::logic_error("Error in " #f " at line " #l ": " s)
#define thrower(f, l, s) do_throw(f, l, s)

#define assure(b, s) ((b) || (thrower(__FILE__, __LINE__, s), 1))

int main()
{
    static const int N = 10000;

    std::random_device rd;
    std::mt19937_64 g(rd());

    std::uniform_real_distribution<double> d(0.0, 1.0);
    std::normal_distribution<double> du(0.0, 1.0);

    std::vector<double> weights;
    std::vector<double> values;

    try {
        std::generate_n(std::back_inserter(weights), N, [&] { return d(g); });
        std::generate_n(std::back_inserter(values), N, [&] { return du(g); });

        assure(weights.size() != 0, "Size == 0");
        assure(weights.size() == values.size(), "Size mismatch");

        std::cout << inner_product(weights.begin(), weights.end(),
            values.begin(), 0.0);
    }
    catch (std::exception &e) {
        std::cerr << e.what() << "\n";
        return 0;
    }
}
int main()
{
    static const int N = 3;

    std::random_device rd;
    std::mt19937_64 g(rd());

    std::uniform_real_distribution<double> d(0.0, 1.0);
    std::normal_distribution<double> du(0.0, 1.0);

    std::vector<std::pair<double, double>> vals;

    try {
        std::generate_n(std::back_inserter(vals), 
                        N, 
                        [&] { return std::make_pair(d(g), du(g)); });

        std::cout << std::accumulate(vals.begin(), vals.end(),
            0.0,
            [](double a, std::pair<double, double> const &b) { 
                return a + b.first * b.second; 
            });

    }
    catch (std::exception &e) {
        std::cerr << e.what() << "\n";
        return 0;
    }
}

Context

StackExchange Code Review Q#136137, answer score: 3

Revisions (0)

No revisions yet.