HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

C++ and STL - Machine Learning Problem

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
problemstllearningmachineand

Problem

I would like to get some general comments on style and use of STL in particular. This is some code I wrote to do machine learning classification (logistic regression). Any suggestions would be very appreciated!

A record containing an individual training example:

/*
 * A class representing a single record to be used for training or evaluation
 * of a classifier. This record contains a vector of predictors which are
 * observations used to predict an outcome, together with a target outcome for
 * that observation. The machine learning algorithm's job is to learn to
 * generalize this predictor to target outcome relationship.
 */
#include "Record.h"
using namespace std;

/**
 * Construct a record from the provided predictor Vector and target outcome.
 */
Record::Record(const Vector& predictor, double target) :
    predictor_(predictor), target_(target) {
}

/**
 * Construct a record by parsing the provided string. The string must contain
 * a set of space seperated values, with the target outcome in column kTargetCol
 * and the predictor vector starting on column kPredictorCol. Note that each
 * element of the predictor vector is preceeded by an index and a colon. So
 * the string looks has the following form:
 *   1: 2: ...
 */
Record::Record(const string& line) {
  Init(line);
}

void Record::Init(const string& line) {
  stringstream line_stream(line);
  string token;
  for (int i = 0; getline(line_stream, token, ' '); ++i) {
    if (i == kTargetCol) {
      istringstream istream(token);
      double target;
      istream >> target;
      target_ = target;
    }
    if (i >= kPredictorCol) {
      istringstream istream(token.substr(token.find(':') + 1));
      double element;
      istream >> element;
      predictor_.push_back(element);
    }
  }
}

/** Convert a record into a string suitable for display */
string Record::String() const {
  ostringstream sstream;
  sstream << target_;
  return sstream.str() + ", " + predictor_.String();
}


A collection of

Solution

Edit (based on the large number of additions) I am re-writing this:
General Notes:

You should add header guards to all your *.h files.
Vector.h

Don't like your use of string conversion methods. Very Java like and not C++ like at all. Prefer to write stream operators which are much more versatile.

string String() const;

  // Prefer

  friend std::ostream& operator<<(std::ostream stream, Vector const& data)
  {
      return stream;
  }


Hide your implementation details by using local typedefs. Also it is more traditional to just call the method to get a const iterator begin() rather than const_begin() etc.

vector::const_iterator const_begin() const;
  vector::iterator begin();

  // Prefer
  typedef std::vector       Container;
  typedef Container::iterator       iterator;
  typedef Container::const_iterator const_iterator;

  iterator       begin();
  const_iterator begin() const;


This works but it also allows some expressions that don't do much;

double operator[](int index) const;

  // Now the compiler will allow:

  Vector  x;
  x[5] = 6;   // Will compile but not do anything.
              // I would prefer this fails to compile so that the
              // user of the class will know that they have done
              // something wrong immediately.

  // Thus I would prefer this declaration:
  double const& operator[](std::size_t index) const;


Not totally convinced that setting up these operators will make using the class easier to read.

Vector operator+(const Vector& operand) const;
  Vector operator-(double scalar) const;
  Vector operator-(const Vector& operand) const;
  Vector operator*(double scalar) const;
  Vector operator/(double scalar) const;
  Vector operator/(const Vector& operand) const;


Surprised this compiles at all here:

private:
  vector elements;


The header file should include all the header files it need to make the required types available. There is no #include in this file. Also you do not prefix vector with std:: so how does it know where it is defined.

There are a couple of explanations to both these problems none of them good.

NEVER put using namespace std; in a header file.

NEVER make a header file depend on being included in the correct order.
Vector.cpp

Vector::Vector() : elements(vector()) {}
// Easier to write:
Vector::Vector() : elements() {}

Vector::Vector(int size) : elements(vector(size, 0)) {}    
// Easier to write
Vector::Vector(int size) : elements(size) {} // Value defaults to 0


I would use the correct type. here.

int Vector::size() const {return elements.size();}
// I would use:
std::size_t Vector::size() const {return elements.size();}


As described above I would return by const reference.

double Vector::operator[](int index) const {
  return elements[index];
}


Here I would not convert to a string.

I would a stream operator (then use some algorithms inside to make it cleaner).

/** Convert a Vector into a string suitable for display. */
string Vector::String() const {

friend std::ostream& operator(stream," "));
   
   stream << "] ";
   return stream;
}


I will add more when I get some time.
Old review.

Rather than write a to_string() method:

string Record::String() const {
  ostringstream sstream;
  sstream << target_;
  return sstream.str() + ", " + predictor_.String();
}


Write a stream operator.

std::ostream& operator<<(std::stream& stream, Record const& data)
{
     return stream << data.target_ << ", " << data.predictor_;
}


If you really just want a string use lexical cast:

Record  record(/* Initialization*/);

// lexical_cast use the stream operator.
std::string data = boost::lexical_cast(record);


I don't see the need for a file-stream member that is kept open for the life of the object. Open it use it then let it fall out of scope:

RecordList::RecordList(const string& filename) {
  input_file_.open(filename.c_str(), ios::in);
  Init();
}

void RecordList::Init() {
  for (string line; getline(input_file_, line);) {
    Record record(line);
    records_.push_back(record);
  }
}

RecordList::~RecordList() {
  input_file_.close();
}

I would just do:

RecordList::RecordList(const string& filename) {
  Init(filename);
}

void RecordList::Init(std::string const& filename) {
  std::ifstream  input_file(filename.c_str());
  for (string line; getline(input_file, line);) {
    Record record(line);
    records_.push_back(record);
  }
}

RecordList::~RecordList() {}


Here you are exposing an implementation detail:

vector::const_iterator RecordList::const_begin() const {


The user of your class should not need to know that you are using a vector internal to your class

``
class RecordList
{
public:
typedef std::vector Container;
typedef Container::iterator iterator;
typedef Container::const_iterator const_iterator;

const_iterator RecordList::const_begin() const;
};

Code Snippets

string String() const;

  // Prefer

  friend std::ostream& operator<<(std::ostream stream, Vector const& data)
  {
      return stream;
  }
vector<double>::const_iterator const_begin() const;
  vector<double>::iterator begin();

  // Prefer
  typedef std::vector<double>       Container;
  typedef Container::iterator       iterator;
  typedef Container::const_iterator const_iterator;

  iterator       begin();
  const_iterator begin() const;
double operator[](int index) const;

  // Now the compiler will allow:

  Vector  x;
  x[5] = 6;   // Will compile but not do anything.
              // I would prefer this fails to compile so that the
              // user of the class will know that they have done
              // something wrong immediately.

  // Thus I would prefer this declaration:
  double const& operator[](std::size_t index) const;
Vector operator+(const Vector& operand) const;
  Vector operator-(double scalar) const;
  Vector operator-(const Vector& operand) const;
  Vector operator*(double scalar) const;
  Vector operator/(double scalar) const;
  Vector operator/(const Vector& operand) const;
private:
  vector<double> elements;

Context

StackExchange Code Review Q#18002, answer score: 6

Revisions (0)

No revisions yet.