HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppModerate

Lexer+Parser code for my "Reedoo" programming language

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
parserprogrammingreedoolexerlanguageforcode

Problem

I've been working on my own programming language, as a hobby for the past couple of months now, called Reedoo. It is implemented in pure C++ 11 and uses only the libraries that come with C++; no external ones. I've never studied Computer Science, but I've done about 2 weeks of software development so far in school, but that's it.

Since I've never studied Computer Science, I've never taken a compiler class, so I'd like some feedback to know whether I'm using techniques that will result in a "useable" language.

The language is currently really, really small. It has a "print" statement, variables, expression evaluation, comments and if statements.

Here's a link to the code on GitHub.

Here's a simple program demonstrating all of the features so far:

if "Hello World" == "Hello World" and "Reedoo" == "reedoo" {
    print "This part of the block shouldn't run"
} else {
    print "This should be displayed in the console."
    print 10 + 2 * (5 + 4)    # Example of expression evaluation and a comment
}


How it works, currently:

Reedoo has a hand-written lexical scanner that scans and input .rdo file and identifies the tokens. Here's the tokens from that program above:

if
cond:string:"Hello World" eqeq string:"Hello World" and string:"Reedoo" eqeq string:"reedoo" 
opencb
print
string:"This part of the block shouldn't run"
sc
closecb
else
cond:string:"Hello World" eqeq string:"Hello World" and string:"Reedoo" eqeq string:"reedoo" 
opencb
print
string:"This should be displayed in the console."
sc
print
expr:(10+2*(5+4))
sc
closecb
sc


Each token is on its own line. The "sc" token stands for semi-colon, the lexer injects semi-colons in place of new lines. When the lexer finds a condition, like in an if statement, it adds the parts of the condition together until it finds the open curly bracket token.

The parser

The parser then takes the tokens and one-by-one adds them together until it matches one of the patterns in the parser. The parser is also ha

Solution

C style

Well, the first thing I have to say is that your code is very C-like. The first action here would be to move all those loose functions and globals into two C++ classes, Lexer and Parser.

Another C vice of your code is to declare variables on top of a function.
lex() is champion at that, with 18 variables piled at the top of the function.
Absolutely don't do that in C++. Declare a variable in its point of use, restricting scope to where the var is needed.

Counters in for loops, especially, should be declared inside the loop:

int i;
for (i = 0; i < prog.size(); ++i) // poor practice
----
for (int i = 0; i < prog.size(); ++i) // much better


And yet another old school C habit is to use the /**/ (multi-line comments) everywhere. Why not use the much more practical // for single liners? The single line comment is even valid in C now.

#define IO_ERROR "[IO ERROR] "
#define SYNTAX_ERROR "[SYNTAX ERROR] "
#define ASSIGN_ERROR "[ASSIGN ERROR] "


These macros are not very nice either. You should prefer a const std::string
or const char[]. For variables/constants that are file scoped, it is a good
practice to declare them inside an unnamed namespace, effectively making the constants file-scoped:

namespace 
{
    const std::string IO_ERROR     { "[IO ERROR] "     };
    const std::string SYNTAX_ERROR { "[SYNTAX ERROR] " };
    const std::string ASSIGN_ERROR { "[ASSIGN ERROR] " };
}


The other globals you have should be made members of a class, but if they were
to remain as globals, then they should also be wrapped with an unnamed namespace.

using namespace std

This was already mentioned and I second for that. Don't use namespace std
because that defeats the purpose of a namespace, which is to allow identical
names to coexist without clashes.

Possible bug?

bool rdo_is_reserved(string tok) {
  int i;
  for (i = 0; i < 9;i++) {
    if (tok == reserved[i])
      return true;
    else
      return false;
  }
  return false;
}


This function is iterating from 0 to 8, however, the reserved array
has 14 elements. Is this intentional or a bug? But this function is useless anyway, since the Standard Library provides std::find(), which is meant for this kind of linear searches and conveys intent much more clearly.

Commented lines leftover

if (n != "") {
  //is_expr = 1;
  //lex_tokens.push_back(reserved[7] + ":" + n);
}


Please clean up lines that are commented out. This is only liable to
cause distraction and confuse the reader. Also, it conveys no important info since
you can't know if the commented out code is correct or not.

Large functions

The lex() function is a massive beast. I wouldn't want to be the poor programmer
tasked with the job of modifying it or fixing a bug caused by it. This function needs an urgent refactoring. It should be broken down into several helper methods.

Calling exit()

lex() also calls exit(1) in its body. This is a bit harsh, don't you think? Terminating the program like this, without cleanup or error message. It should instead throw an exception. This is C++ and we have much more modern ways of handling errors, plus with an exception the caller has the chance of recovering from the error if possible.

goto and the lost label

You have a label inside the parse() function, TOP:, which I would expect be
paired with a goto statement somewhere. However, I couldn't find a single goto inside the function. So what is that label even doing there?

Since you never did CS school, you might not be aware of the status of goto... Lets just say that it is quite controversial. I for one really don't think goto still has a place in modern C++, like I said before, the language has exceptions and automatic resource management, so goto is pretty outdated.

couts

Your code is also littered with cout calls, most are commented out.
You shouldn't be using it directly inside functions like you did. Most users
won't want the verbose output on every execution of the program. If you need them
for debugging, a better approach is to wrap those calls into a macro that can be disabled at compile time, e.g.:

#define DEBUG_LOG(msg) do { std::cout << msg << std::end; } while(0)


Or use a better, more mature, log library. Do a search and you will find a ton of good options.

Code Snippets

int i;
for (i = 0; i < prog.size(); ++i) // poor practice
----
for (int i = 0; i < prog.size(); ++i) // much better
#define IO_ERROR "[IO ERROR] "
#define SYNTAX_ERROR "[SYNTAX ERROR] "
#define ASSIGN_ERROR "[ASSIGN ERROR] "
namespace 
{
    const std::string IO_ERROR     { "[IO ERROR] "     };
    const std::string SYNTAX_ERROR { "[SYNTAX ERROR] " };
    const std::string ASSIGN_ERROR { "[ASSIGN ERROR] " };
}
bool rdo_is_reserved(string tok) {
  int i;
  for (i = 0; i < 9;i++) {
    if (tok == reserved[i])
      return true;
    else
      return false;
  }
  return false;
}
if (n != "") {
  //is_expr = 1;
  //lex_tokens.push_back(reserved[7] + ":" + n);
}

Context

StackExchange Code Review Q#62965, answer score: 19

Revisions (0)

No revisions yet.