HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Lexer code in C++

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
codelexerstackoverflow

Problem

I've got a lexer written in C++ (Visual Studio 2010, so including lambdas and a few other C++0x tricks). This is my first lexing experience, the only other source code interpretation I ever did was trivial, didn't separate parsing and lexing, that sort of thing. The grammar closely resembles C++ without templates and the lexer should output appropriate tokens.

Concerns:

  • Unicode support- I absolutely want to support foreign-languages fully in this lexer.



  • I also don't transmit any other content than the token type, and I'd also like to know how extensively I'd have to re-write to include them, for example, if I want to actually output an AST, I'll need to keep the identifiers instead of just deciding that they are and junking them.



  • It's also been a long time since I wrote non-trivial C++ and I'd like to know how my comments are.



  • In addition, I've used tools like ANTLR and flex and they generated massive lexers, and I'm concerned that I'm missing a trick here.



Things that I'm definitely not missing:

  • No string or character literals- I haven't implemented them yet.



  • No const, volatile, or cast keywords



That's pretty much it, I think.

```
class LexedFile {
public:
enum Token {
// RESERVED WORDS (and identifier)
// Taken care of by next() lambda.
// They're inserted in order (except identifier) so shouldn't be too hard to verify
Namespace,
Identifier,
For,
While,
Do,
Switch,
Case,
Default,
Try,
Catch,
Auto,
Type,
Break,
Continue,
Return,
Static,
Sizeof,
Decltype,
If,
Else,

// LOGICAL OPERATORS
LogicalAnd,
LogicalOr,
LogicalNot,
GreaterThan,
GreaterThanOrEqual,
LessThan,
LessThanOrEqual,
EqualComparison,
NotEqualComparison,

// BINARY OPERATORS
AND, // Re-used for addres

Solution

well....duplicated code is the first order

current >= L'0' && current <= L'9'


might want to extract it out to isDigit or something. same with isWhitespace.....etc even if used one spot, it will help it be a lot more readable.

In fact you might want to take various parts and extract into functions.

I'm not a fan of while(true) loops. I think they could maybe be written better.

I think maybe having more primitive functions to help you navigate the text would be useful skipTill, etc.

Code Snippets

current >= L'0' && current <= L'9'

Context

StackExchange Code Review Q#3573, answer score: 4

Revisions (0)

No revisions yet.