patterncppMinor

Predictive recursive descent parser for math expressions

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

codereview cpp stackoverflow c++parsing math-expression-eval boost

predictiveparserexpressionsrecursivemathfordescent

Problem

I've been learning about language theory and parsing, and decided to write my first parser: a LL(1) recursive descent parser. But actually, it does a little more than just expressions; it can also parse variable definitions with the "define" keyword. It parses into an AST and then the other portion of the program evaluates the AST, but I'm not posting that half (right now at least).

tree.h:

```
#include
#include
#include
#include

#include

template struct t_unary_op;
template struct t_binary_op;
template struct t_nary_op;

template
struct t_negate : public t_unary_op
{
template
t_negate(Args&&... args): t_unary_op(std::forward(args)...) {}
};

template
struct t_add : public t_binary_op
{
template
t_add(Args&&... args): t_binary_op(std::forward(args)...) {}
};

template
struct t_subtract : public t_binary_op
{
template
t_subtract(Args&&... args): t_binary_op(std::forward(args)...) {}
};

template
struct t_multiply : public t_binary_op
{
template
t_multiply(Args&&... args): t_binary_op(std::forward(args)...) {}
};

template
struct t_divide : public t_binary_op
{
template
t_divide(Args&&... args): t_binary_op(std::forward(args)...) {}
};

template
struct t_exponentiate : public t_binary_op
{
template
t_exponentiate(Args&&... args): t_binary_op(std::forward(args)...) {}
};

template
struct t_var_occurrance
{
std::string name;

t_var_occurrance(std::string _name): name(std::move(_name)) {}
};

template
struct t_func_invocation;

template
struct t_arg_placeholder
{
unsigned int index;

t_arg_placeholder(unsigned int _index): index(_index) {}
};

template
using t_expression = boost::variant>,
boost::recursive_wrapper>,
boost::recursive_wrapper>,
boost::recursive_wrapper>,
boost::recursive_wrapper>,
boost::r

Solution

You've posted a lot of code, so this review focuses primarily on lexer.h, which you've indicated may have the most to improve.

Code Style

Use include guards

Include guards ensure that you don't get symbol-redefinition errors when a header is included multiple times. You should wrap your headers with a unique macro definition:

#ifndef MY_PARSER_TREE
#define MY_PARSER_TREE
// header contents go here
#endif

#include organization

It is a good idea to keep your standard library #include directives sorted alphabetically. This makes it easy to spot duplicates, or to add new dependencies in the right position.

Iterator categories

If you have an interface which takes iterators, you should at least specify the category of iterator which your algorithm requires. For instance, if skip_spaces() requires input iterators, you should name the iterator type as such:

template  // or InputIt for short

This is the convention followed by the standard library. If you want more safety than mere convention, there are metaprogramming solutions in the standard library (like std::enable_if).

Logic

Don't use goto

While there may be (very, very rare) legitimate uses of goto, it should almost never be used. It makes your program hard to reason about. If you're tempted to use it, take a step back and re-think your design. This is related to my next point:

Reconsider use of switch

I hardly ever use switch statements, usually because there is a better abstraction that serves the same purpose. Your switch statement appears to be handling three distinct cases: digit, lower-case letter, and operator. As a first step, I would create functions that encapsulate the identification and handling of each case:

const auto character = *first;
if (is_operator(character))
{
    // handle it...
}
else if (islower(character))
{
    // handle it...
}
else // etc.

Predicate functions

Creating functions to identify a character is clearer and less error-prone. It's easy to forget to type a letter when you create a case for each lower-case letter. There are two standard-library functions islower() and isdigit() which cover most of the ground in your switch.

Your operators can be represented as a std::set, and then testing is straightforward:

bool is_operator(char c)
{
    const std::set operators = {'+', '-', '*', '/', '^', '(', ')', '='};
    return operators.find(c) != operators.end();
}

If you want to be more flexible, you could read these operator symbols from a file, but the principle is the same.

Reinventing the wheel

Some of your code replicates functionality that already exists in the standard library. skip_spaces can be replaced by find_if_not:

first = std::find_if_not(first, last, isspace);

And your construction of temp under the id label:

string temp; // earlier
do
{
    temp.push_back(*first++);
} while(first != last && islower(*first));

can similarly be clarified using a few algorithms and the string constructor taking iterators:

const auto nonLowercase = std::find_if_not(first, last, islower); // much clearer
const string temp{first, nonLowercase};
first = nonLowercase;

Unnecessary constructor

The single-argument token constructor is only called once, with an Invalid tag. This behavior is identical to the default constructor, so you could remove the constructor and change the single usage to call the default constructor. This will tighten up the interface for token. It doesn't make sense to create a token with no data that isn't Invalid, right?

Code Snippets

#ifndef MY_PARSER_TREE
#define MY_PARSER_TREE
// header contents go here
#endif

template <typename InputIterator> // or InputIt for short

const auto character = *first;
if (is_operator(character))
{
    // handle it...
}
else if (islower(character))
{
    // handle it...
}
else // etc.

bool is_operator(char c)
{
    const std::set<char> operators = {'+', '-', '*', '/', '^', '(', ')', '='};
    return operators.find(c) != operators.end();
}

first = std::find_if_not(first, last, isspace);

Context

StackExchange Code Review Q#57951, answer score: 4

Revisions (0)

No revisions yet.