patterncppMinor
Parsing a basic scripting language
Viewed 0 times
parsingscriptingbasiclanguage
Problem
I'm working on writing a scripting language with ANTLR and C++. This is my first actual move from ANTLR grammars into the C++ API, so I'd like to know if this would be a good way to structure the grammar (later I will be adding a tree parser or tree rewriting rules though).
```
grammar dyst;
options
{
language = C;
output = AST;
ASTLabelType=pANTLR3_BASE_TREE;
}
program : statement*;
statement : stopUsingNamespaceStm|usingNamespaceStm|namespaceDefineStm|functionStm|defineStm|assignStm|funcDefineStm|ifStm|whileStm|returnStm|breakStm|eventDefStm|eventCallStm|linkStm|classDefStm|exitStm|importStm|importOnceStm|directive;
namespaceDefineStm : 'namespace' ident '{' statement* '}';
usingNamespaceStm : 'using' 'namespace' ident (',' ident)* ';';
stopUsingNamespaceStm : 'stop' 'using' 'namespace' ident (',' ident)* ';';
directive : '@' directiveId argList? ';';
directiveId : ID (':' ID)*;
importOnceStm : 'import_once' expression ';';
importStm : 'import' expression ';';
exitStm : 'exit' expression? ';';
classDefStm : 'class' ident ('extends' ident (',' ident))? '{' (classSection|funcDefineStm|defineStm|eventDefStm) '}';
classSection : ('public'|'private'|'protected') ':';
linkStm : 'link' ident 'to' ident (',' ident)* ';';
eventCallStm : 'call' ident (',' argList)? ';';
eventDefStm : 'event' ident '(' paramList? ')' ';';
returnStm : 'return' expression ';';
breakStm : 'break' int ';';
ifStm : 'if' '(' expression ')' '{' statement* '}';
whileStm : 'while' '(' expression ')' '{' statement* '}';
defineStm : 'global'? 'def' ident ('=' expression)? ';';
assignStm : ident '=' expression ';';
funcDefineStm : 'function' ident '(' paramList? ')' ('handles' ident (',' ident))? '{' statement '}';
paramList : param (',' param)?;
param : ident ('=' expression)?;
functionStm : functionCall ';';
functionCall : ident '(' argList? ')';
argList : expression (',' expression)*;
//Expressions!
term : functionCall|value|'(' expression ')';
logic_not : ('!
```
grammar dyst;
options
{
language = C;
output = AST;
ASTLabelType=pANTLR3_BASE_TREE;
}
program : statement*;
statement : stopUsingNamespaceStm|usingNamespaceStm|namespaceDefineStm|functionStm|defineStm|assignStm|funcDefineStm|ifStm|whileStm|returnStm|breakStm|eventDefStm|eventCallStm|linkStm|classDefStm|exitStm|importStm|importOnceStm|directive;
namespaceDefineStm : 'namespace' ident '{' statement* '}';
usingNamespaceStm : 'using' 'namespace' ident (',' ident)* ';';
stopUsingNamespaceStm : 'stop' 'using' 'namespace' ident (',' ident)* ';';
directive : '@' directiveId argList? ';';
directiveId : ID (':' ID)*;
importOnceStm : 'import_once' expression ';';
importStm : 'import' expression ';';
exitStm : 'exit' expression? ';';
classDefStm : 'class' ident ('extends' ident (',' ident))? '{' (classSection|funcDefineStm|defineStm|eventDefStm) '}';
classSection : ('public'|'private'|'protected') ':';
linkStm : 'link' ident 'to' ident (',' ident)* ';';
eventCallStm : 'call' ident (',' argList)? ';';
eventDefStm : 'event' ident '(' paramList? ')' ';';
returnStm : 'return' expression ';';
breakStm : 'break' int ';';
ifStm : 'if' '(' expression ')' '{' statement* '}';
whileStm : 'while' '(' expression ')' '{' statement* '}';
defineStm : 'global'? 'def' ident ('=' expression)? ';';
assignStm : ident '=' expression ';';
funcDefineStm : 'function' ident '(' paramList? ')' ('handles' ident (',' ident))? '{' statement '}';
paramList : param (',' param)?;
param : ident ('=' expression)?;
functionStm : functionCall ';';
functionCall : ident '(' argList? ')';
argList : expression (',' expression)*;
//Expressions!
term : functionCall|value|'(' expression ')';
logic_not : ('!
Solution
-
The grammar itself is pretty unreadable "as is". A rule like:
would be far more readable when declared like this:
-
You'll want to explicitly end the entry point of your parser, the rule
-
Make explicit tokens for keywords, don't mix them inside your parser rules.
Instead of:
it's better to do:
This will make your life easier at a later (tree walking) stage. Without explicit lexer tokens, it is unclear for you when debugging what tokens there actually are in your tree.
-
Your lexer rules:
can never contain either double- or single quotes. So, it's impossible to have a string literal with a double- and single quote in it.
Better to do something like this:
which will allow a double quoted string to contain double quotes as well.
That's all I saw at a first glance. I didn't look real close, so there might be more that can be improved.
The grammar itself is pretty unreadable "as is". A rule like:
statement : stopUsingNamespaceStm|usingNamespaceStm|namespaceDefineStm|functionStm|defineStm|assignStm|funcDefineStm|ifStm|whileStm|returnStm|breakStm|eventDefStm|eventCallStm|linkStm|classDefStm|exitStm|importStm|importOnceStm|directive;would be far more readable when declared like this:
statement
: stopUsingNamespaceStm
| usingNamespaceStm
| namespaceDefineStm
| functionStm
| defineStm
| assignStm
| funcDefineStm
| ifStm
| whileStm
| returnStm
| breakStm
| eventDefStm
| eventCallStm
| linkStm
| classDefStm
| exitStm
| importStm
| importOnceStm
| directive
;-
You'll want to explicitly end the entry point of your parser, the rule
program, with the end-of-file token, otherwise your parser might stop parsing prematurely. With EOF, you force the parser to read the entire tokens stream.program
: statement* EOF
;-
Make explicit tokens for keywords, don't mix them inside your parser rules.
Instead of:
importStm
: 'import' expression ';'
;it's better to do:
importStm
: Import expression ';'
;
Import
: 'import'
;This will make your life easier at a later (tree walking) stage. Without explicit lexer tokens, it is unclear for you when debugging what tokens there actually are in your tree.
-
Your lexer rules:
STRING_DOUBLE : '"' .* '"';
STRING_SINGLE : '\'' .* '\'';can never contain either double- or single quotes. So, it's impossible to have a string literal with a double- and single quote in it.
Better to do something like this:
STRING_DOUBLE
: '"' ('\\' ('\\' | '"') | ~('\\' | '"'))* '"'
;which will allow a double quoted string to contain double quotes as well.
That's all I saw at a first glance. I didn't look real close, so there might be more that can be improved.
Code Snippets
statement : stopUsingNamespaceStm|usingNamespaceStm|namespaceDefineStm|functionStm|defineStm|assignStm|funcDefineStm|ifStm|whileStm|returnStm|breakStm|eventDefStm|eventCallStm|linkStm|classDefStm|exitStm|importStm|importOnceStm|directive;statement
: stopUsingNamespaceStm
| usingNamespaceStm
| namespaceDefineStm
| functionStm
| defineStm
| assignStm
| funcDefineStm
| ifStm
| whileStm
| returnStm
| breakStm
| eventDefStm
| eventCallStm
| linkStm
| classDefStm
| exitStm
| importStm
| importOnceStm
| directive
;program
: statement* EOF
;importStm
: 'import' expression ';'
;importStm
: Import expression ';'
;
Import
: 'import'
;Context
StackExchange Code Review Q#1487, answer score: 7
Revisions (0)
No revisions yet.