patterncppMinor
Predictive recursive descent parser for math expressions
Viewed 0 times
predictiveparserexpressionsrecursivemathfordescent
Problem
I've been learning about language theory and parsing, and decided to write my first parser: a LL(1) recursive descent parser. But actually, it does a little more than just expressions; it can also parse variable definitions with the "define" keyword. It parses into an AST and then the other portion of the program evaluates the AST, but I'm not posting that half (right now at least).
tree.h:
```
#include
#include
#include
#include
#include
template struct t_unary_op;
template struct t_binary_op;
template struct t_nary_op;
template
struct t_negate : public t_unary_op
{
template
t_negate(Args&&... args): t_unary_op(std::forward(args)...) {}
};
template
struct t_add : public t_binary_op
{
template
t_add(Args&&... args): t_binary_op(std::forward(args)...) {}
};
template
struct t_subtract : public t_binary_op
{
template
t_subtract(Args&&... args): t_binary_op(std::forward(args)...) {}
};
template
struct t_multiply : public t_binary_op
{
template
t_multiply(Args&&... args): t_binary_op(std::forward(args)...) {}
};
template
struct t_divide : public t_binary_op
{
template
t_divide(Args&&... args): t_binary_op(std::forward(args)...) {}
};
template
struct t_exponentiate : public t_binary_op
{
template
t_exponentiate(Args&&... args): t_binary_op(std::forward(args)...) {}
};
template
struct t_var_occurrance
{
std::string name;
t_var_occurrance(std::string _name): name(std::move(_name)) {}
};
template
struct t_func_invocation;
template
struct t_arg_placeholder
{
unsigned int index;
t_arg_placeholder(unsigned int _index): index(_index) {}
};
template
using t_expression = boost::variant>,
boost::recursive_wrapper>,
boost::recursive_wrapper>,
boost::recursive_wrapper>,
boost::recursive_wrapper>,
boost::r
tree.h:
```
#include
#include
#include
#include
#include
template struct t_unary_op;
template struct t_binary_op;
template struct t_nary_op;
template
struct t_negate : public t_unary_op
{
template
t_negate(Args&&... args): t_unary_op(std::forward(args)...) {}
};
template
struct t_add : public t_binary_op
{
template
t_add(Args&&... args): t_binary_op(std::forward(args)...) {}
};
template
struct t_subtract : public t_binary_op
{
template
t_subtract(Args&&... args): t_binary_op(std::forward(args)...) {}
};
template
struct t_multiply : public t_binary_op
{
template
t_multiply(Args&&... args): t_binary_op(std::forward(args)...) {}
};
template
struct t_divide : public t_binary_op
{
template
t_divide(Args&&... args): t_binary_op(std::forward(args)...) {}
};
template
struct t_exponentiate : public t_binary_op
{
template
t_exponentiate(Args&&... args): t_binary_op(std::forward(args)...) {}
};
template
struct t_var_occurrance
{
std::string name;
t_var_occurrance(std::string _name): name(std::move(_name)) {}
};
template
struct t_func_invocation;
template
struct t_arg_placeholder
{
unsigned int index;
t_arg_placeholder(unsigned int _index): index(_index) {}
};
template
using t_expression = boost::variant>,
boost::recursive_wrapper>,
boost::recursive_wrapper>,
boost::recursive_wrapper>,
boost::recursive_wrapper>,
boost::r
Solution
You've posted a lot of code, so this review focuses primarily on
Code Style
Use include guards
Include guards ensure that you don't get symbol-redefinition errors when a header is included multiple times. You should wrap your headers with a unique macro definition:
It is a good idea to keep your standard library
Iterator categories
If you have an interface which takes iterators, you should at least specify the category of iterator which your algorithm requires. For instance, if
This is the convention followed by the standard library. If you want more safety than mere convention, there are metaprogramming solutions in the standard library (like
Logic
Don't use
While there may be (very, very rare) legitimate uses of
Reconsider use of
I hardly ever use
Predicate functions
Creating functions to identify a character is clearer and less error-prone. It's easy to forget to type a letter when you create a
Your operators can be represented as a
If you want to be more flexible, you could read these operator symbols from a file, but the principle is the same.
Reinventing the wheel
Some of your code replicates functionality that already exists in the standard library.
And your construction of
can similarly be clarified using a few algorithms and the
Unnecessary constructor
The single-argument
lexer.h, which you've indicated may have the most to improve.Code Style
Use include guards
Include guards ensure that you don't get symbol-redefinition errors when a header is included multiple times. You should wrap your headers with a unique macro definition:
#ifndef MY_PARSER_TREE
#define MY_PARSER_TREE
// header contents go here
#endif#include organizationIt is a good idea to keep your standard library
#include directives sorted alphabetically. This makes it easy to spot duplicates, or to add new dependencies in the right position.Iterator categories
If you have an interface which takes iterators, you should at least specify the category of iterator which your algorithm requires. For instance, if
skip_spaces() requires input iterators, you should name the iterator type as such:template // or InputIt for shortThis is the convention followed by the standard library. If you want more safety than mere convention, there are metaprogramming solutions in the standard library (like
std::enable_if).Logic
Don't use
gotoWhile there may be (very, very rare) legitimate uses of
goto, it should almost never be used. It makes your program hard to reason about. If you're tempted to use it, take a step back and re-think your design. This is related to my next point:Reconsider use of
switchI hardly ever use
switch statements, usually because there is a better abstraction that serves the same purpose. Your switch statement appears to be handling three distinct cases: digit, lower-case letter, and operator. As a first step, I would create functions that encapsulate the identification and handling of each case:const auto character = *first;
if (is_operator(character))
{
// handle it...
}
else if (islower(character))
{
// handle it...
}
else // etc.Predicate functions
Creating functions to identify a character is clearer and less error-prone. It's easy to forget to type a letter when you create a
case for each lower-case letter. There are two standard-library functions islower() and isdigit() which cover most of the ground in your switch.Your operators can be represented as a
std::set, and then testing is straightforward:bool is_operator(char c)
{
const std::set operators = {'+', '-', '*', '/', '^', '(', ')', '='};
return operators.find(c) != operators.end();
}If you want to be more flexible, you could read these operator symbols from a file, but the principle is the same.
Reinventing the wheel
Some of your code replicates functionality that already exists in the standard library.
skip_spaces can be replaced by find_if_not:first = std::find_if_not(first, last, isspace);And your construction of
temp under the id label:string temp; // earlier
do
{
temp.push_back(*first++);
} while(first != last && islower(*first));can similarly be clarified using a few algorithms and the
string constructor taking iterators:const auto nonLowercase = std::find_if_not(first, last, islower); // much clearer
const string temp{first, nonLowercase};
first = nonLowercase;Unnecessary constructor
The single-argument
token constructor is only called once, with an Invalid tag. This behavior is identical to the default constructor, so you could remove the constructor and change the single usage to call the default constructor. This will tighten up the interface for token. It doesn't make sense to create a token with no data that isn't Invalid, right?Code Snippets
#ifndef MY_PARSER_TREE
#define MY_PARSER_TREE
// header contents go here
#endiftemplate <typename InputIterator> // or InputIt for shortconst auto character = *first;
if (is_operator(character))
{
// handle it...
}
else if (islower(character))
{
// handle it...
}
else // etc.bool is_operator(char c)
{
const std::set<char> operators = {'+', '-', '*', '/', '^', '(', ')', '='};
return operators.find(c) != operators.end();
}first = std::find_if_not(first, last, isspace);Context
StackExchange Code Review Q#57951, answer score: 4
Revisions (0)
No revisions yet.