patterncppModerate
Proper practice using enums in header files
Viewed 0 times
headerpracticefilesusingproperenums
Problem
Long time coder, but very very new to
My first question is around exactly how I should construct the header file. Specifically curious about the right way to handle the
C++. I'm trying to put together my first real-sized project and figuring out what I need to change about my script-kiddie style of C++ to get there. Below is my working code.#include
#include
#include
#include
enum class Metric { Cosine, Euclidean};
double cosine(std::vector A, std::vector B, bool similarity){
double mul = 0.0;
double d_a = 0.0;
double d_b = 0.0;
std::vector::iterator A_iter = A.begin();
std::vector::iterator B_iter = B.begin();
for ( ; A_iter != A.end(); A_iter++ , B_iter++ ) {
mul += *A_iter * *B_iter;
d_a += *A_iter * *A_iter;
d_b += *B_iter * *B_iter;
}
if (d_a == 0.0f || d_b == 0.0f) {
throw std::logic_error("Vectors must have a length greater than zero");
}
double cos_similarity = mul / sqrt(d_a * d_b);
if (similarity == false) {
return acos(cos_similarity) / M_PI;
} else {
return cos_similarity;
}
}
double euclidean(std::vector A, std::vectorB, bool similarity) {
double sq_dist = 0.0;
std::vector::iterator A_iter = A.begin();
std::vector::iterator B_iter = B.begin();
for ( ; A_iter != A.end(); A_iter++, B_iter++) {
sq_dist += pow(*A_iter - *B_iter, 2);
}
if (similarity == false) {
return sqrt(sq_dist);
} else {
return 1.0/sqrt(sq_dist);
}
}
double distance(std::vector A, std::vector B, Metric metric, bool similarity = false) {
if (A.size() != B.size()) {
throw std::logic_error("A and B must be the same size!");
}
if (A.size() A, std::vector B, Metric metric) {
return distance(A, B, metric, true);
}My first question is around exactly how I should construct the header file. Specifically curious about the right way to handle the
enum. Secondly, any general feedback, things that I could clean up, cut out, etc... would be greatly appreciated.Solution
Prefer C++ headers
If you
#include "similarity.h"
#include
#include
namespace {
double cosine_distance(const std::vector& a, const std::vector& b)
{
double mul = 0.0;
double d_a = 0.0;
double d_b = 0.0;
auto ia = a.begin();
auto ib = b.begin();
for (; ia != a.end(); ia++, ib++) {
mul += ia *ib;
d_a += ia *ia;
d_b += ib *ib;
}
if (d_a == 0.0 || d_b == 0.0) {
throw std::invalid_argument("Vectors must have a magnitude greater than zero");
}
If you
#include instead of `, the mathematical functions will be safely and unambiguously in the std namespace.
Pass read-only parameters by const reference
Instead of copying the input vectors, it's usually more efficient to pass const references to them:
double cosine(const std::vector& A, const std::vector& B, bool similarity);
double euclidean(const std::vector& A, const std::vector& B, bool similarity);
double distance(const std::vector& A, const std::vector& B,
Metric metric, bool similarity = false);
Use auto to deduce types
Instead of writing
std::vector::iterator A_iter = A.begin();
std::vector::iterator B_iter = B.begin();
We can ask the compiler to deduce the correct type:
auto A_iter = A.begin();
auto B_iter = B.begin();
This also makes it easier if you ever want to change the type of A and B (perhaps even to a template argument type). In fact, by changing to const references, we already changed the type (to std::vector::const_iterator), but auto isolates us from that.
Use the right kind of exception
You could be more specific, and use std::invalid_argument for the error type here:
if (d_a == 0.0 || d_b == 0.0) {
throw std::invalid_argument("Vectors must have a magnitude greater than zero");
}
I've changed the wording above, because length usually means the number of elements when we're using standard collections.
I also changed the zero constant to be a double, the same as d_a and d_b, since it will get promoted for the comparison anyway. Only use float when you really need the reduced precision.
Don't compare booleans against false
This is hard to read:
if (similarity == false) {
// negative case
} else {
// positive case
}
You can re-write it as
if (similarity) {
// positive case
} else {
// negative case
}
This will normally produce identical code, but the cognitive overhead is reduced.
Avoid non-standard extensions
Although many implementations define a constant called M_PI, this isn't specified by any standard. Thankfully it's easy to make your own value, using the knowledge that tan(¼π) is 1:
static const auto pi = 4*std::atan(1);
Avoid std::pow(x, 2)
To square a number, it's simpler and more efficient to multiply it by itself. pow() is very general (handling non-integer powers), and usually works by multiplication of logarithms.
Even though std::pow() has overloads for integer powers, I find that for squaring, it's easier to read a simple multiplication rather than a function call. It's worth introducing a temporary to avoid that:
//sq_dist += pow(*A_iter - *B_iter, 2);
auto dist = *A_iter - *B_iter;
sq_dist += dist * dist;
Don't fall off the end of non-void functions
Here, we're assuming that we're always passed a valid enum constant:
switch (metric) {
case Metric::Cosine:
return cosine(A, B, similarity);
case Metric::Euclidean:
return euclidean(A, B, similarity);
}
// implicit return
Robust code checks that we don't flow out of the switch:
switch (metric) {
case Metric::Cosine:
return cosine(A, B, similarity);
case Metric::Euclidean:
return euclidean(A, B, similarity);
}
throw std::invalid_argument("Invalid metric");
Invalid enum values shouldn't happen, but it costs little to be defensive against a mistaken cast.
Design issues
It's not clear whether cosine() and euclidean() are intended to be part of the public interface. If they are not, then give them internal linkage (e.g. by use of the anonymous namespace, or with the static keyword), and consider moving the similarity parameter out of them.
I prefer to avoid booleans that change the meaning of functions like that; see if you agree when you see my full example.
Here's a full version with the above issues all addressed (and assuming that cosine() and euclidean() are intended to be internal detail).
Declarations
This is what I'd put in the header file (let's call it similarity.h):
#ifndef SIMILARITY_H
#define SIMILARITY_H
#include
enum class Metric { Cosine, Euclidean };
double distance(const std::vector&, const std::vector&, Metric);
double similarity(const std::vector&, const std::vector&, Metric);
#endif
Implementation
Having included the header, here's the C++ file:
``#include "similarity.h"
#include
#include
namespace {
double cosine_distance(const std::vector& a, const std::vector& b)
{
double mul = 0.0;
double d_a = 0.0;
double d_b = 0.0;
auto ia = a.begin();
auto ib = b.begin();
for (; ia != a.end(); ia++, ib++) {
mul += ia *ib;
d_a += ia *ia;
d_b += ib *ib;
}
if (d_a == 0.0 || d_b == 0.0) {
throw std::invalid_argument("Vectors must have a magnitude greater than zero");
}
Code Snippets
double cosine(const std::vector<double>& A, const std::vector<double>& B, bool similarity);
double euclidean(const std::vector<double>& A, const std::vector<double>& B, bool similarity);
double distance(const std::vector<double>& A, const std::vector<double>& B,
Metric metric, bool similarity = false);std::vector<double>::iterator A_iter = A.begin();
std::vector<double>::iterator B_iter = B.begin();auto A_iter = A.begin();
auto B_iter = B.begin();if (d_a == 0.0 || d_b == 0.0) {
throw std::invalid_argument("Vectors must have a magnitude greater than zero");
}if (similarity == false) {
// negative case
} else {
// positive case
}Context
StackExchange Code Review Q#159059, answer score: 11
Revisions (0)
No revisions yet.