HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

A header-only linear-time C++11 PEG parser generator supporting left-recursion and grammar ambiguity

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
leftlinearheaderrecursionparsertimegeneratorambiguitygrammarsupporting

Problem

I've rewritten my original parser generator to a header-only library which uses templates and functionals for better type safety and clarity. The generated parser creates an abstract syntax tree which can be evaluated effectively using functionals and a visitor pattern. The parser memorizes intermediate steps resulting in guaranteed linear parsing time (squared in worst-case if the grammar includes left-recursion). My goal is to create a general-purpose C++ parser generator with focus on usability. So far documentation is missing but I believe the usage should become more or less clear from the following example.

This code creates a simple calculator using left-recursion and C++11 functionals:

#include 
#include 
#include 

#include "parser/parser.h"

using namespace std;
using namespace lars;

int main(int argc, char ** argv){
  parsing_expression_grammar_builder g;
  using expression = expression;

  unordered_map variables;

  g["Expression"]  ";
    getline(cin,str);
    if(str == "q" || str == "quit")break;
    try {
      auto e = p.parse(str);
      cout ::error e){
      cout << "  ";
      for(auto i UNUSED :range(e.begin_position().character))cout << " ";
      for(auto i UNUSED :range(e.length()))cout << "~";
      cout << "^\n";
      cout << e.error_message() << " while parsing " << e.rule_name() << endl;
    }
  }

  return 0;
}


This code creates the same without left-recursion and using a visitor pattern:

```
#include
#include
#include

#include "parser/parser.h"

using namespace std;
using namespace lars;

class math_visitor{

double value;
unordered_map variables;

public:

double get_value(){
return value;
}

double get_value(expression e){
e.accept(this);
return get_value();
}

void visit_number(expression e){
value = stod(e.string());
}

void visit_set_variable(expression e){
variables[e[0].string()] = get_value(e[1]);
}

void visit_variable(expression e){
value = variables[e[0].string(

Solution

You have "data" embedded in code in the function visit_left_binary_operator_list. You could use an std::unordered_map to help separate the data and the actual algorithm:

void visit_left_binary_operator_list (expression e){

  static const std::unordered_map binary_ops = {
    { '+', [](double x, double y) { return x + y; } },
    { '-', [](double x, double y) { return x - y; } },
    { '*', [](double x, double y) { return x * y; } },
    { '/', [](double x, double y) { return x / y; } }
  };

  double lhs = get_value(e[0]);

  for(auto i:range((e.size()-1)/2)*2+1){
    auto op = binary_ops.find(e[i].character());
    if (op == binary_ops.end()) {
      throw "undefined operator";
    }

    double rhs = get_value(e[i+1]);
    lhs = op->second(lhs, rhs);
  }

  value = lhs;
}


You may also want to find a suitable name for the expression \$ ((x-1)/2)*2+1 \$. Currently, it is not really readable, but it feels like an operation common enough in the wild to need a proper name.

Instead of having a special case for exponent, wouldn't it be better to have a visit_right_binary_operator_list as well? It would be a first step to some more generic handling of operators. Also, it would be more generic if operators could be std::string instances instead of char instances to handle multi-character operators.

To answer your question: while I don't have a full understanding of lexers and parsers, what your code does is clear to me. The only thing I had a problem with at first was the value()/get_value() whose naming can be a little bit ambiguous. But once you know visitors, regex and lambdas, you quickly have a good understanding of what is happening.

Code Snippets

void visit_left_binary_operator_list (expression<math_visitor> e){

  static const std::unordered_map<char, double(*)(double, double)> binary_ops = {
    { '+', [](double x, double y) { return x + y; } },
    { '-', [](double x, double y) { return x - y; } },
    { '*', [](double x, double y) { return x * y; } },
    { '/', [](double x, double y) { return x / y; } }
  };

  double lhs = get_value(e[0]);

  for(auto i:range((e.size()-1)/2)*2+1){
    auto op = binary_ops.find(e[i].character());
    if (op == binary_ops.end()) {
      throw "undefined operator";
    }

    double rhs = get_value(e[i+1]);
    lhs = op->second(lhs, rhs);
  }

  value = lhs;
}

Context

StackExchange Code Review Q#72043, answer score: 8

Revisions (0)

No revisions yet.