patterncppMinor
Lexer code in C++
Viewed 0 times
codelexerstackoverflow
Problem
I've got a lexer written in C++ (Visual Studio 2010, so including lambdas and a few other C++0x tricks). This is my first lexing experience, the only other source code interpretation I ever did was trivial, didn't separate parsing and lexing, that sort of thing. The grammar closely resembles C++ without templates and the lexer should output appropriate tokens.
Concerns:
Things that I'm definitely not missing:
That's pretty much it, I think.
```
class LexedFile {
public:
enum Token {
// RESERVED WORDS (and identifier)
// Taken care of by next() lambda.
// They're inserted in order (except identifier) so shouldn't be too hard to verify
Namespace,
Identifier,
For,
While,
Do,
Switch,
Case,
Default,
Try,
Catch,
Auto,
Type,
Break,
Continue,
Return,
Static,
Sizeof,
Decltype,
If,
Else,
// LOGICAL OPERATORS
LogicalAnd,
LogicalOr,
LogicalNot,
GreaterThan,
GreaterThanOrEqual,
LessThan,
LessThanOrEqual,
EqualComparison,
NotEqualComparison,
// BINARY OPERATORS
AND, // Re-used for addres
Concerns:
- Unicode support- I absolutely want to support foreign-languages fully in this lexer.
- I also don't transmit any other content than the token type, and I'd also like to know how extensively I'd have to re-write to include them, for example, if I want to actually output an AST, I'll need to keep the identifiers instead of just deciding that they are and junking them.
- It's also been a long time since I wrote non-trivial C++ and I'd like to know how my comments are.
- In addition, I've used tools like ANTLR and flex and they generated massive lexers, and I'm concerned that I'm missing a trick here.
Things that I'm definitely not missing:
- No string or character literals- I haven't implemented them yet.
- No const, volatile, or cast keywords
That's pretty much it, I think.
```
class LexedFile {
public:
enum Token {
// RESERVED WORDS (and identifier)
// Taken care of by next() lambda.
// They're inserted in order (except identifier) so shouldn't be too hard to verify
Namespace,
Identifier,
For,
While,
Do,
Switch,
Case,
Default,
Try,
Catch,
Auto,
Type,
Break,
Continue,
Return,
Static,
Sizeof,
Decltype,
If,
Else,
// LOGICAL OPERATORS
LogicalAnd,
LogicalOr,
LogicalNot,
GreaterThan,
GreaterThanOrEqual,
LessThan,
LessThanOrEqual,
EqualComparison,
NotEqualComparison,
// BINARY OPERATORS
AND, // Re-used for addres
Solution
well....duplicated code is the first order
might want to extract it out to isDigit or something. same with isWhitespace.....etc even if used one spot, it will help it be a lot more readable.
In fact you might want to take various parts and extract into functions.
I'm not a fan of while(true) loops. I think they could maybe be written better.
I think maybe having more primitive functions to help you navigate the text would be useful skipTill, etc.
current >= L'0' && current <= L'9'might want to extract it out to isDigit or something. same with isWhitespace.....etc even if used one spot, it will help it be a lot more readable.
In fact you might want to take various parts and extract into functions.
I'm not a fan of while(true) loops. I think they could maybe be written better.
I think maybe having more primitive functions to help you navigate the text would be useful skipTill, etc.
Code Snippets
current >= L'0' && current <= L'9'Context
StackExchange Code Review Q#3573, answer score: 4
Revisions (0)
No revisions yet.