HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Standard deviation in one pass in C++ - follow-up

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
deviationpassstandardonefollow

Problem

(See the previous and initial iteration.)

This time, I use tag dispatching in order to make sure that non-random access iterators are not put into a std::distance, which would take \$\Theta(n)\$ time to complete, but instead, computes the distance while making the only pass over the data. See what I have now:

coderodde_sd.h

#ifndef CODERODDE_SD_H
#define CODERODDE_SD_H

#include 
#include 
#include 
#include 

#define DEBUG

#ifdef DEBUG
#include 
using std::cout;
#endif

namespace net {

    namespace coderodde {

        namespace stat {

            template
            double sd(RandomAccessIter begin, 
                      RandomAccessIter end, 
                      std::random_access_iterator_tag)
            {
                #ifdef DEBUG
                    cout 
            double sd(ForwardIter begin, 
                      ForwardIter end, 
                      std::forward_iterator_tag) 
            {
                #ifdef DEBUG
                    cout 
            double sd(Iter begin, Iter end) 
            {
                return sd(begin,
                          end, 
                   typename std::iterator_traits
                               ::iterator_category());
            }

        } /* net::coderodde::stat */

    } /* net::coderodde */

} /* net */

#endif  /* CODERODDE_SD_H */


main.cpp:

```
#include
#include
#include
#include "coderodde_sd.h"

using net::coderodde::stat::sd;
using std::cout;
using std::list;
using std::vector;

int main(int argc, char** argv) {
double bad_array[]{1.0};

try
{
sd(bad_array, bad_array);
}
catch (std::runtime_error& error)
{
cout my_list = { 1, 5, 2, 4, 3 };
vector my_vector { 3, 4, 2, 5, 1 };
int my_array[] = { 2, 4, 5, 3, 1 };

cout << "Standard deviation (list): "
<< sd(my_list.begin(), my_list.end())
<< "\n";

cout << "Standard deviation (vector): "
<< sd(my_vector.begin(), my_vector.end())

Solution

Your two algorithms are almost identical, and parts of it can be factored out into their own functions. For example,

void throw_if_too_small(size_t distance) 
{
    if (distance 
double compute_sd(Iter begin, Iter end)
{
    double x = 0.0;
    double x_squared = 0.0;

    for (auto it = begin; it != end; ++it) 
    {
        x += *it;
        x_squared += (*it) * (*it);
    }

    return std::sqrt((x_squared - (x * x) / distance) / 
                     (distance - 1)
                    );
}


Then you have to make the ForwardIter version a bit uglier (or you can put extra code into compute_sd, but I think I'd prefer changing the ForwardIter version.

template
double sd(RandomAccessIter begin, 
          RandomAccessIter end, 
          std::random_access_iterator_tag)
{
    #ifdef DEBUG
        cout 
double sd(ForwardIter begin, 
          ForwardIter end, 
          std::forward_iterator_tag) 
{
    #ifdef DEBUG
        cout << "[DEBUG] ForwardIter version of sd is running. ";     
    #endif 

    size_t distance = 0;

    for (auto iter = begin; iter != end || distance < 2; ++iter) 
    {
        ++distance;
    }

    throw_if_too_small(distance);

    return compute_sd(begin, end);
}


There are a few ways you could handle the ForwardIter version - This is probably the most obvious solution. I think a better one is to push the tag identification down a layer, into a distance_at_least_two function, and then specialize there.

template 
bool distance_at_least_two(ForwardIter begin, ForwardIter end, std::forward_iterator_tag)
{
    size_t distance = 0;

    for (auto iter = begin; iter != end || distance 
bool distance_at_least_two(RandomAccessIter begin, RandomAccessIter end, std::random_access_iterator_tag)
{
    return std::distance(begin, end) >= 2;
}


and then the calling function can change into this

template
double sd(Iter begin, Iter end) 
{
    if (distance_at_least_two(begin, end, std::iterator_traits::iterator_category()))
    {
        return compute_sd(begin, end);
    }
    else
    {
        std::stringstream ss;
        ss << "The standard deviation cannot be computed for "
              "less than two elements. The input sequence has "
           << distance 
           << (distance == 1 ? " element." : " elements.");

        throw std::runtime_error(ss.str());
    }
}


And you then don't need two separate sd specializations. You could probably also play with std::enable_if to avoid explicitly passing the iterator tag.

I don't like the chunk of debug code there - that sort of logging/debugging seems unnecessary. You could also DRY it out and hide the macro ugliness too.

void debug(std::string name)
{
    #ifdef DEBUG
        cout << "[DEBUG " << name << "version of sd is running. ";
    #endif
}


You could also have it take an iterator tag as the parameter, and then overload the function and just hardcode the name. That smells worse to me, but just a thought.

I mentioned it before so I won't explain it again, but with a little extra work you could write a more functional solution, which are (imo) usually easier to read than explicit for loops.

Code Snippets

void throw_if_too_small(size_t distance) 
{
    if (distance < 2)
    {
        std::stringstream ss;
        ss << "The standard deviation cannot be computed for "
              "less than two elements. The input sequence has "
           << distance 
           << (distance == 1 ? " element." : " elements.");

        throw std::runtime_error(ss.str());
    }
}

template <typename Iter>
double compute_sd(Iter begin, Iter end)
{
    double x = 0.0;
    double x_squared = 0.0;

    for (auto it = begin; it != end; ++it) 
    {
        x += *it;
        x_squared += (*it) * (*it);
    }

    return std::sqrt((x_squared - (x * x) / distance) / 
                     (distance - 1)
                    );
}
template<typename RandomAccessIter>
double sd(RandomAccessIter begin, 
          RandomAccessIter end, 
          std::random_access_iterator_tag)
{
    #ifdef DEBUG
        cout << "[DEBUG] RandomAccessIter version of sd is "
                "running. ";
    #endif

    auto distance = std::distance(begin, end);
    throw_if_too_small(distance);

    return compute_sd(begin, end);
}

template<typename ForwardIter>
double sd(ForwardIter begin, 
          ForwardIter end, 
          std::forward_iterator_tag) 
{
    #ifdef DEBUG
        cout << "[DEBUG] ForwardIter version of sd is running. ";     
    #endif 

    size_t distance = 0;

    for (auto iter = begin; iter != end || distance < 2; ++iter) 
    {
        ++distance;
    }

    throw_if_too_small(distance);

    return compute_sd(begin, end);
}
template <typename ForwardIter>
bool distance_at_least_two(ForwardIter begin, ForwardIter end, std::forward_iterator_tag)
{
    size_t distance = 0;

    for (auto iter = begin; iter != end || distance < 2; ++iter) 
    {
        ++distance;
    }

    return distance == 2;
}

template <typename RandomAccessIter>
bool distance_at_least_two(RandomAccessIter begin, RandomAccessIter end, std::random_access_iterator_tag)
{
    return std::distance(begin, end) >= 2;
}
template<typename Iter>
double sd(Iter begin, Iter end) 
{
    if (distance_at_least_two(begin, end, std::iterator_traits<Iter>::iterator_category()))
    {
        return compute_sd(begin, end);
    }
    else
    {
        std::stringstream ss;
        ss << "The standard deviation cannot be computed for "
              "less than two elements. The input sequence has "
           << distance 
           << (distance == 1 ? " element." : " elements.");

        throw std::runtime_error(ss.str());
    }
}
void debug(std::string name)
{
    #ifdef DEBUG
        cout << "[DEBUG " << name << "version of sd is running. ";
    #endif
}

Context

StackExchange Code Review Q#134871, answer score: 4

Revisions (0)

No revisions yet.