patterncMinor
Lexer for a language I'm working on
Viewed 0 times
workingforlanguagelexer
Problem
I've recently started working on making my own programming language and I just wrapped up its lexer. I'm too young to take any official training in C, compiler construction, or computer science so I'm having mixed feelings in the quality of my code. It seems rather sluggish when printing but I haven't really found out the time it takes from start to finish. I'm using Visual Studio 2015 and C11 I believe (if I'm not VS is stupid).
Here is an example of the grammar:int (#nested comment # double nest?? # #
I included the token header instead of the lexer header because the
Token.h
Lexer.c:
```
#define _CRT_SECURE_NO_WARNINGS
#define DEBUG 1
#include "error.h"
#include "token.h"
#include
#include
#include
#include
#include
static const keyword_t keywords[] = {
// Primitive data types
{"int", _int},
{"double", _dbl},
{"enum", _enum},
{"void", _void},
{"char", _char},
{"string", _str},
{"bool", _bool},
{"const", _const},
{
Here is an example of the grammar:int (#nested comment # double nest?? # #
int meme) {
double v_a-r = 10.2123 + 200 * 1.;
#symbol testing; also, comment!!!
{} [] () - + * / -= += *= /= ^ %
return var;
}I included the token header instead of the lexer header because the
Lexerc type should be fairly obvious and I didn't want to scare anyone off with more code than there already is.Token.h
#include "token.h"
#include "lexer.h"
#include
#include
token_t* token_new(lexer_t* lexer, tk_type type) {
token_t* token = malloc(sizeof(token_t));
token->line = lexer->line;
token->pos = lexer->pos;
token->type = type;
return token;
}
void token_print(token_t* token) {
printf("\ntype: %i", token->line);
printf("\tline: %i", token->line);
printf("\tpos: %i", token->pos);
if (token->type == _int)
printf("\tint val: %i", token->num);
else if (token->type == _dbl)
printf("\tflt val: %d", token->flt);
else
printf("\tstr val: %s", token->str);
}
void token_free(token_t* token) {
if (token->str != NULL)
free(token->str);
free(token->str);
}Lexer.c:
```
#define _CRT_SECURE_NO_WARNINGS
#define DEBUG 1
#include "error.h"
#include "token.h"
#include
#include
#include
#include
#include
static const keyword_t keywords[] = {
// Primitive data types
{"int", _int},
{"double", _dbl},
{"enum", _enum},
{"void", _void},
{"char", _char},
{"string", _str},
{"bool", _bool},
{"const", _const},
{
Solution
It is very hard to review a lexer without formal definition of the language (honestly, I have very vague understanding how the nested comments are supposed to be structured). However, even without such definition, certain things are surely bugs. For example, in
only the first case should create a
also looks extremely suspicious.
In general, instead of huge (and very error prone)
I presume
I don't see how
At the same time you may notice that a textual representation of a keyword, operator, or punctuation adds zero information to a token (it can be trivially recovered from the token type), and for them you can safely make
You should get a warning for a non-void function returning without a value:
case '%':
token = token_new(lexer, _mod);
token->str = "%";
lexer_adv(lexer, 1);
break;
case '^':
token = token_new(lexer, _mod);
token->str = "^";
lexer_adv(lexer, 1);
break;only the first case should create a
_mod token. Thecase '>':
if (lexer_look(lexer, 1) == '<') {also looks extremely suspicious.
In general, instead of huge (and very error prone)
case statement it is recommended to extend the keywords table with operators and punctuation (make sure that long operators come before short ones), and loop over it same way you do for keywords.I presume
token.h is really token.c. An actual token.h with a token_t definition is missing.I don't see how
token_free is called, but I expect problems. It blindly attempts to free(token->str), even though some token strings have not been allocated, but point to string literals.At the same time you may notice that a textual representation of a keyword, operator, or punctuation adds zero information to a token (it can be trivially recovered from the token type), and for them you can safely make
token->str a null pointer.You should get a warning for a non-void function returning without a value:
static char lexer_look(lexer_t* lexer, size_t ahead) {
if (lexer->len ptr + ahead)
return;
return lexer->src[lexer->ptr + ahead];
}Code Snippets
case '%':
token = token_new(lexer, _mod);
token->str = "%";
lexer_adv(lexer, 1);
break;
case '^':
token = token_new(lexer, _mod);
token->str = "^";
lexer_adv(lexer, 1);
break;case '>':
if (lexer_look(lexer, 1) == '<') {static char lexer_look(lexer_t* lexer, size_t ahead) {
if (lexer->len < lexer->ptr + ahead)
return;
return lexer->src[lexer->ptr + ahead];
}Context
StackExchange Code Review Q#138106, answer score: 5
Revisions (0)
No revisions yet.