patterncppModerate
Lexer+Parser code for my "Reedoo" programming language
Viewed 0 times
parserprogrammingreedoolexerlanguageforcode
Problem
I've been working on my own programming language, as a hobby for the past couple of months now, called Reedoo. It is implemented in pure C++ 11 and uses only the libraries that come with C++; no external ones. I've never studied Computer Science, but I've done about 2 weeks of software development so far in school, but that's it.
Since I've never studied Computer Science, I've never taken a compiler class, so I'd like some feedback to know whether I'm using techniques that will result in a "useable" language.
The language is currently really, really small. It has a "print" statement, variables, expression evaluation, comments and
Here's a link to the code on GitHub.
Here's a simple program demonstrating all of the features so far:
How it works, currently:
Reedoo has a hand-written lexical scanner that scans and input .rdo file and identifies the tokens. Here's the tokens from that program above:
Each token is on its own line. The "sc" token stands for semi-colon, the lexer injects semi-colons in place of new lines. When the lexer finds a condition, like in an
The parser
The parser then takes the tokens and one-by-one adds them together until it matches one of the patterns in the parser. The parser is also ha
Since I've never studied Computer Science, I've never taken a compiler class, so I'd like some feedback to know whether I'm using techniques that will result in a "useable" language.
The language is currently really, really small. It has a "print" statement, variables, expression evaluation, comments and
if statements.Here's a link to the code on GitHub.
Here's a simple program demonstrating all of the features so far:
if "Hello World" == "Hello World" and "Reedoo" == "reedoo" {
print "This part of the block shouldn't run"
} else {
print "This should be displayed in the console."
print 10 + 2 * (5 + 4) # Example of expression evaluation and a comment
}How it works, currently:
Reedoo has a hand-written lexical scanner that scans and input .rdo file and identifies the tokens. Here's the tokens from that program above:
if
cond:string:"Hello World" eqeq string:"Hello World" and string:"Reedoo" eqeq string:"reedoo"
opencb
print
string:"This part of the block shouldn't run"
sc
closecb
else
cond:string:"Hello World" eqeq string:"Hello World" and string:"Reedoo" eqeq string:"reedoo"
opencb
print
string:"This should be displayed in the console."
sc
print
expr:(10+2*(5+4))
sc
closecb
scEach token is on its own line. The "sc" token stands for semi-colon, the lexer injects semi-colons in place of new lines. When the lexer finds a condition, like in an
if statement, it adds the parts of the condition together until it finds the open curly bracket token.The parser
The parser then takes the tokens and one-by-one adds them together until it matches one of the patterns in the parser. The parser is also ha
Solution
C style
Well, the first thing I have to say is that your code is very C-like. The first action here would be to move all those loose functions and globals into two C++ classes,
Another C vice of your code is to declare variables on top of a function.
Absolutely don't do that in C++. Declare a variable in its point of use, restricting scope to where the var is needed.
Counters in for loops, especially, should be declared inside the loop:
And yet another old school C habit is to use the
These macros are not very nice either. You should prefer a
or
practice to declare them inside an unnamed namespace, effectively making the constants file-scoped:
The other globals you have should be made members of a class, but if they were
to remain as globals, then they should also be wrapped with an unnamed namespace.
This was already mentioned and I second for that. Don't use namespace std
because that defeats the purpose of a namespace, which is to allow identical
names to coexist without clashes.
Possible bug?
This function is iterating from 0 to 8, however, the
has 14 elements. Is this intentional or a bug? But this function is useless anyway, since the Standard Library provides
Commented lines leftover
Please clean up lines that are commented out. This is only liable to
cause distraction and confuse the reader. Also, it conveys no important info since
you can't know if the commented out code is correct or not.
Large functions
The
tasked with the job of modifying it or fixing a bug caused by it. This function needs an urgent refactoring. It should be broken down into several helper methods.
Calling
You have a label inside the
paired with a
Since you never did CS school, you might not be aware of the status of
Your code is also littered with
You shouldn't be using it directly inside functions like you did. Most users
won't want the verbose output on every execution of the program. If you need them
for debugging, a better approach is to wrap those calls into a macro that can be disabled at compile time, e.g.:
Or use a better, more mature, log library. Do a search and you will find a ton of good options.
Well, the first thing I have to say is that your code is very C-like. The first action here would be to move all those loose functions and globals into two C++ classes,
Lexer and Parser.Another C vice of your code is to declare variables on top of a function.
lex() is champion at that, with 18 variables piled at the top of the function.Absolutely don't do that in C++. Declare a variable in its point of use, restricting scope to where the var is needed.
Counters in for loops, especially, should be declared inside the loop:
int i;
for (i = 0; i < prog.size(); ++i) // poor practice
----
for (int i = 0; i < prog.size(); ++i) // much betterAnd yet another old school C habit is to use the
/**/ (multi-line comments) everywhere. Why not use the much more practical // for single liners? The single line comment is even valid in C now.#define IO_ERROR "[IO ERROR] "
#define SYNTAX_ERROR "[SYNTAX ERROR] "
#define ASSIGN_ERROR "[ASSIGN ERROR] "These macros are not very nice either. You should prefer a
const std::stringor
const char[]. For variables/constants that are file scoped, it is a goodpractice to declare them inside an unnamed namespace, effectively making the constants file-scoped:
namespace
{
const std::string IO_ERROR { "[IO ERROR] " };
const std::string SYNTAX_ERROR { "[SYNTAX ERROR] " };
const std::string ASSIGN_ERROR { "[ASSIGN ERROR] " };
}The other globals you have should be made members of a class, but if they were
to remain as globals, then they should also be wrapped with an unnamed namespace.
using namespace stdThis was already mentioned and I second for that. Don't use namespace std
because that defeats the purpose of a namespace, which is to allow identical
names to coexist without clashes.
Possible bug?
bool rdo_is_reserved(string tok) {
int i;
for (i = 0; i < 9;i++) {
if (tok == reserved[i])
return true;
else
return false;
}
return false;
}This function is iterating from 0 to 8, however, the
reserved arrayhas 14 elements. Is this intentional or a bug? But this function is useless anyway, since the Standard Library provides
std::find(), which is meant for this kind of linear searches and conveys intent much more clearly.Commented lines leftover
if (n != "") {
//is_expr = 1;
//lex_tokens.push_back(reserved[7] + ":" + n);
}Please clean up lines that are commented out. This is only liable to
cause distraction and confuse the reader. Also, it conveys no important info since
you can't know if the commented out code is correct or not.
Large functions
The
lex() function is a massive beast. I wouldn't want to be the poor programmertasked with the job of modifying it or fixing a bug caused by it. This function needs an urgent refactoring. It should be broken down into several helper methods.
Calling
exit()lex() also calls exit(1) in its body. This is a bit harsh, don't you think? Terminating the program like this, without cleanup or error message. It should instead throw an exception. This is C++ and we have much more modern ways of handling errors, plus with an exception the caller has the chance of recovering from the error if possible.goto and the lost labelYou have a label inside the
parse() function, TOP:, which I would expect be paired with a
goto statement somewhere. However, I couldn't find a single goto inside the function. So what is that label even doing there?Since you never did CS school, you might not be aware of the status of
goto... Lets just say that it is quite controversial. I for one really don't think goto still has a place in modern C++, like I said before, the language has exceptions and automatic resource management, so goto is pretty outdated.coutsYour code is also littered with
cout calls, most are commented out.You shouldn't be using it directly inside functions like you did. Most users
won't want the verbose output on every execution of the program. If you need them
for debugging, a better approach is to wrap those calls into a macro that can be disabled at compile time, e.g.:
#define DEBUG_LOG(msg) do { std::cout << msg << std::end; } while(0)Or use a better, more mature, log library. Do a search and you will find a ton of good options.
Code Snippets
int i;
for (i = 0; i < prog.size(); ++i) // poor practice
----
for (int i = 0; i < prog.size(); ++i) // much better#define IO_ERROR "[IO ERROR] "
#define SYNTAX_ERROR "[SYNTAX ERROR] "
#define ASSIGN_ERROR "[ASSIGN ERROR] "namespace
{
const std::string IO_ERROR { "[IO ERROR] " };
const std::string SYNTAX_ERROR { "[SYNTAX ERROR] " };
const std::string ASSIGN_ERROR { "[ASSIGN ERROR] " };
}bool rdo_is_reserved(string tok) {
int i;
for (i = 0; i < 9;i++) {
if (tok == reserved[i])
return true;
else
return false;
}
return false;
}if (n != "") {
//is_expr = 1;
//lex_tokens.push_back(reserved[7] + ":" + n);
}Context
StackExchange Code Review Q#62965, answer score: 19
Revisions (0)
No revisions yet.