patterncppMinor

Parsing a basic scripting language

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

parsingscriptingbasiclanguage

Problem

I'm working on writing a scripting language with ANTLR and C++. This is my first actual move from ANTLR grammars into the C++ API, so I'd like to know if this would be a good way to structure the grammar (later I will be adding a tree parser or tree rewriting rules though).

```
grammar dyst;

options
{
language = C;
output = AST;
ASTLabelType=pANTLR3_BASE_TREE;
}

program : statement*;

statement : stopUsingNamespaceStm|usingNamespaceStm|namespaceDefineStm|functionStm|defineStm|assignStm|funcDefineStm|ifStm|whileStm|returnStm|breakStm|eventDefStm|eventCallStm|linkStm|classDefStm|exitStm|importStm|importOnceStm|directive;

namespaceDefineStm : 'namespace' ident '{' statement* '}';

usingNamespaceStm : 'using' 'namespace' ident (',' ident)* ';';

stopUsingNamespaceStm : 'stop' 'using' 'namespace' ident (',' ident)* ';';

directive : '@' directiveId argList? ';';

directiveId : ID (':' ID)*;

importOnceStm : 'import_once' expression ';';

importStm : 'import' expression ';';

exitStm : 'exit' expression? ';';

classDefStm : 'class' ident ('extends' ident (',' ident))? '{' (classSection|funcDefineStm|defineStm|eventDefStm) '}';

classSection : ('public'|'private'|'protected') ':';

linkStm : 'link' ident 'to' ident (',' ident)* ';';

eventCallStm : 'call' ident (',' argList)? ';';

eventDefStm : 'event' ident '(' paramList? ')' ';';

returnStm : 'return' expression ';';

breakStm : 'break' int ';';

ifStm : 'if' '(' expression ')' '{' statement* '}';

whileStm : 'while' '(' expression ')' '{' statement* '}';

defineStm : 'global'? 'def' ident ('=' expression)? ';';

assignStm : ident '=' expression ';';

funcDefineStm : 'function' ident '(' paramList? ')' ('handles' ident (',' ident))? '{' statement '}';

paramList : param (',' param)?;

param : ident ('=' expression)?;

functionStm : functionCall ';';

functionCall : ident '(' argList? ')';

argList : expression (',' expression)*;

//Expressions!
term : functionCall|value|'(' expression ')';

logic_not : ('!

Solution

-
The grammar itself is pretty unreadable "as is". A rule like:

statement : stopUsingNamespaceStm|usingNamespaceStm|namespaceDefineStm|functionStm|defineStm|assignStm|funcDefineStm|ifStm|whileStm|returnStm|breakStm|eventDefStm|eventCallStm|linkStm|classDefStm|exitStm|importStm|importOnceStm|directive;

would be far more readable when declared like this:

statement 
  :  stopUsingNamespaceStm
  |  usingNamespaceStm
  |  namespaceDefineStm
  |  functionStm
  |  defineStm
  |  assignStm
  |  funcDefineStm
  |  ifStm
  |  whileStm
  |  returnStm
  |  breakStm
  |  eventDefStm
  |  eventCallStm
  |  linkStm
  |  classDefStm
  |  exitStm
  |  importStm
  |  importOnceStm
  |  directive
  ;

-
You'll want to explicitly end the entry point of your parser, the rule program, with the end-of-file token, otherwise your parser might stop parsing prematurely. With EOF, you force the parser to read the entire tokens stream.

program 
  :  statement* EOF
  ;

-
Make explicit tokens for keywords, don't mix them inside your parser rules.

Instead of:

importStm 
  :  'import' expression ';'
  ;

it's better to do:

importStm 
  :   Import expression ';'
  ;

Import
  :  'import'
  ;

This will make your life easier at a later (tree walking) stage. Without explicit lexer tokens, it is unclear for you when debugging what tokens there actually are in your tree.

-
Your lexer rules:

STRING_DOUBLE : '"' .* '"';
STRING_SINGLE : '\'' .* '\'';

can never contain either double- or single quotes. So, it's impossible to have a string literal with a double- and single quote in it.

Better to do something like this:

STRING_DOUBLE 
  :  '"' ('\\' ('\\' | '"') | ~('\\' | '"'))* '"'
  ;

which will allow a double quoted string to contain double quotes as well.

That's all I saw at a first glance. I didn't look real close, so there might be more that can be improved.

Code Snippets

statement : stopUsingNamespaceStm|usingNamespaceStm|namespaceDefineStm|functionStm|defineStm|assignStm|funcDefineStm|ifStm|whileStm|returnStm|breakStm|eventDefStm|eventCallStm|linkStm|classDefStm|exitStm|importStm|importOnceStm|directive;

statement 
  :  stopUsingNamespaceStm
  |  usingNamespaceStm
  |  namespaceDefineStm
  |  functionStm
  |  defineStm
  |  assignStm
  |  funcDefineStm
  |  ifStm
  |  whileStm
  |  returnStm
  |  breakStm
  |  eventDefStm
  |  eventCallStm
  |  linkStm
  |  classDefStm
  |  exitStm
  |  importStm
  |  importOnceStm
  |  directive
  ;

program 
  :  statement* EOF
  ;

importStm 
  :  'import' expression ';'
  ;

importStm 
  :   Import expression ';'
  ;

Import
  :  'import'
  ;

Context

StackExchange Code Review Q#1487, answer score: 7

Revisions (0)

No revisions yet.