patterncppMinor
A header-only linear-time C++11 PEG parser generator supporting left-recursion and grammar ambiguity
Viewed 0 times
leftlinearheaderrecursionparsertimegeneratorambiguitygrammarsupporting
Problem
I've rewritten my original parser generator to a header-only library which uses templates and functionals for better type safety and clarity. The generated parser creates an abstract syntax tree which can be evaluated effectively using functionals and a visitor pattern. The parser memorizes intermediate steps resulting in guaranteed linear parsing time (squared in worst-case if the grammar includes left-recursion). My goal is to create a general-purpose C++ parser generator with focus on usability. So far documentation is missing but I believe the usage should become more or less clear from the following example.
This code creates a simple calculator using left-recursion and C++11 functionals:
This code creates the same without left-recursion and using a visitor pattern:
```
#include
#include
#include
#include "parser/parser.h"
using namespace std;
using namespace lars;
class math_visitor{
double value;
unordered_map variables;
public:
double get_value(){
return value;
}
double get_value(expression e){
e.accept(this);
return get_value();
}
void visit_number(expression e){
value = stod(e.string());
}
void visit_set_variable(expression e){
variables[e[0].string()] = get_value(e[1]);
}
void visit_variable(expression e){
value = variables[e[0].string(
This code creates a simple calculator using left-recursion and C++11 functionals:
#include
#include
#include
#include "parser/parser.h"
using namespace std;
using namespace lars;
int main(int argc, char ** argv){
parsing_expression_grammar_builder g;
using expression = expression;
unordered_map variables;
g["Expression"] ";
getline(cin,str);
if(str == "q" || str == "quit")break;
try {
auto e = p.parse(str);
cout ::error e){
cout << " ";
for(auto i UNUSED :range(e.begin_position().character))cout << " ";
for(auto i UNUSED :range(e.length()))cout << "~";
cout << "^\n";
cout << e.error_message() << " while parsing " << e.rule_name() << endl;
}
}
return 0;
}This code creates the same without left-recursion and using a visitor pattern:
```
#include
#include
#include
#include "parser/parser.h"
using namespace std;
using namespace lars;
class math_visitor{
double value;
unordered_map variables;
public:
double get_value(){
return value;
}
double get_value(expression e){
e.accept(this);
return get_value();
}
void visit_number(expression e){
value = stod(e.string());
}
void visit_set_variable(expression e){
variables[e[0].string()] = get_value(e[1]);
}
void visit_variable(expression e){
value = variables[e[0].string(
Solution
You have "data" embedded in code in the function
You may also want to find a suitable name for the expression \$ ((x-1)/2)*2+1 \$. Currently, it is not really readable, but it feels like an operation common enough in the wild to need a proper name.
Instead of having a special case for exponent, wouldn't it be better to have a
To answer your question: while I don't have a full understanding of lexers and parsers, what your code does is clear to me. The only thing I had a problem with at first was the
visit_left_binary_operator_list. You could use an std::unordered_map to help separate the data and the actual algorithm:void visit_left_binary_operator_list (expression e){
static const std::unordered_map binary_ops = {
{ '+', [](double x, double y) { return x + y; } },
{ '-', [](double x, double y) { return x - y; } },
{ '*', [](double x, double y) { return x * y; } },
{ '/', [](double x, double y) { return x / y; } }
};
double lhs = get_value(e[0]);
for(auto i:range((e.size()-1)/2)*2+1){
auto op = binary_ops.find(e[i].character());
if (op == binary_ops.end()) {
throw "undefined operator";
}
double rhs = get_value(e[i+1]);
lhs = op->second(lhs, rhs);
}
value = lhs;
}You may also want to find a suitable name for the expression \$ ((x-1)/2)*2+1 \$. Currently, it is not really readable, but it feels like an operation common enough in the wild to need a proper name.
Instead of having a special case for exponent, wouldn't it be better to have a
visit_right_binary_operator_list as well? It would be a first step to some more generic handling of operators. Also, it would be more generic if operators could be std::string instances instead of char instances to handle multi-character operators.To answer your question: while I don't have a full understanding of lexers and parsers, what your code does is clear to me. The only thing I had a problem with at first was the
value()/get_value() whose naming can be a little bit ambiguous. But once you know visitors, regex and lambdas, you quickly have a good understanding of what is happening.Code Snippets
void visit_left_binary_operator_list (expression<math_visitor> e){
static const std::unordered_map<char, double(*)(double, double)> binary_ops = {
{ '+', [](double x, double y) { return x + y; } },
{ '-', [](double x, double y) { return x - y; } },
{ '*', [](double x, double y) { return x * y; } },
{ '/', [](double x, double y) { return x / y; } }
};
double lhs = get_value(e[0]);
for(auto i:range((e.size()-1)/2)*2+1){
auto op = binary_ops.find(e[i].character());
if (op == binary_ops.end()) {
throw "undefined operator";
}
double rhs = get_value(e[i+1]);
lhs = op->second(lhs, rhs);
}
value = lhs;
}Context
StackExchange Code Review Q#72043, answer score: 8
Revisions (0)
No revisions yet.