patterncsharpMinor
Roman numerals with ANTLR
Viewed 0 times
withromanantlrnumerals
Problem
I've written a simple interpreter with ANTLR for evaluating Roman numerals. Here's the contents of the grammar file (Roman.g4):
The target language I'm using is C# and my interpreter is implemented as a visitor. To compile the ANTLR parser/lexer, first create the directory
Here's my visitor implementation:
```
public class RomanDecodeVisitor : AbstractParseTreeVisitor, IRomanVisitor
{
public int VisitRoot(RomanParser.RootContext context)
{
return RomanSum(from child in context.children select Visit(child));
}
public int VisitHundreds(RomanParser.HundredsContext context)
{
return RomanSum(from child in context.children select Visit(child));
}
public int VisitTens(RomanParser.TensContext context)
{
return RomanSum(from child in context.children select Visit(child));
}
public int VisitUnits(RomanParser.UnitsContext context)
{
return RomanSum(from child in context.children select Visit(child));
}
public int VisitOne(RomanParser.OneContext context)
{
return 1;
}
public int Vis
grammar Roman;
root : (oneThousand)* hundreds? tens? units?;
// --- I, II, III, IV, IX or V VI, VII, VIII
units : one ((one)* | five | ten) | five (one)*;
// --- X, XX, XXX, XL, XC or L, LX, LXX, LXXX
tens : ten ((ten)* | fifty | oneHundred) | fifty (ten)*;
// --- C, CC, CCC, CD, CM or D, DC, DCC, DCCC
hundreds : oneHundred ((oneHundred)* | fiveHundred | oneThousand) | fiveHundred (oneHundred)*;
// --- atomic definitions
one : 'I';
five : 'V';
ten : 'X';
fifty : 'L';
oneHundred : 'C';
fiveHundred : 'D';
oneThousand : 'M';
// --- skip over white spaces, tabs, newlines
WS : [ \t\r\n]+ -> skip ;The target language I'm using is C# and my interpreter is implemented as a visitor. To compile the ANTLR parser/lexer, first create the directory
Antlr at the same level as the grammar file, then run antlr4 Roman.g4 -Dlanguage=CSharp_v4_0 -visitor -o Antlr (or something equivalent if antlr4 isn't an alias). You should include the Antlr directory and the generated files in your project.Here's my visitor implementation:
```
public class RomanDecodeVisitor : AbstractParseTreeVisitor, IRomanVisitor
{
public int VisitRoot(RomanParser.RootContext context)
{
return RomanSum(from child in context.children select Visit(child));
}
public int VisitHundreds(RomanParser.HundredsContext context)
{
return RomanSum(from child in context.children select Visit(child));
}
public int VisitTens(RomanParser.TensContext context)
{
return RomanSum(from child in context.children select Visit(child));
}
public int VisitUnits(RomanParser.UnitsContext context)
{
return RomanSum(from child in context.children select Visit(child));
}
public int VisitOne(RomanParser.OneContext context)
{
return 1;
}
public int Vis
Solution
I'm not sure this counts as a Code Review answer, but I wanted to at least share what I found out after nearly a day of fighting with this...
As I think you know, your grammar allows strings like
This is definitively a bug, and a damn hard one to squish. Part of the problem is that Roman Numerals are not a context free language. (i.e. We want to add in this case, but subtract in that case.)
I think you were right about counting the tokens, but I took an approach of doing it in the lexer, which I'm not sure was a good idea. It might have been easier to do at the Parsing stage, but it seemed to make sense to me that this be a Syntax error. Anyway. Here's what I did.
I derived a subclass from the Antlr generated
Obviously, we'll need to change your test class to use this customized Lexer instead.
Now, you notice that
As for an actual review then... I dislike that each visitor method is a copy/paste of each other visitor method. I'd go one step further and create this method overload.
But at this point, it becomes obvious that we're just walking the tree, which makes me wonder if it wouldn't be best to just implement a
As I think you know, your grammar allows strings like
"IIII" and "XXXX". This is definitively a bug, and a damn hard one to squish. Part of the problem is that Roman Numerals are not a context free language. (i.e. We want to add in this case, but subtract in that case.)
I think you were right about counting the tokens, but I took an approach of doing it in the lexer, which I'm not sure was a good idea. It might have been easier to do at the Parsing stage, but it seemed to make sense to me that this be a Syntax error. Anyway. Here's what I did.
I derived a subclass from the Antlr generated
RomanLexer. This class counts the number of consecutive tokens and alerts any ErrorListeners that there was a SyntaxError.public class DerivedRomanLexer : RomanLanguage.RomanLexer
{
public DerivedRomanLexer(ICharStream input)
:base(input)
{ }
private IToken _lastToken;
private int _lastTokenCount;
public override IToken NextToken()
{
var token = base.NextToken();
if (_lastToken != null)
{
if (_lastToken.Type == token.Type)
{
_lastTokenCount++;
}
else
{
_lastTokenCount = 0;
}
if (_lastTokenCount > 2)
{
foreach (BaseErrorListener listener in this.ErrorListeners)
{
listener.SyntaxError(this, token, token.Line, token.StartIndex, "Too many tokens. There can be a max of 3.", new RecognitionException(this, this._input));
}
}
}
_lastToken = token;
return token;
}
}Obviously, we'll need to change your test class to use this customized Lexer instead.
// setup the lexer
var inputStream = new AntlrInputStream(text);
//var lexer = new RomanLexer(inputStream);
var lexer = new DerivedRomanLexer(inputStream);
lexer.RemoveErrorListeners();
lexer.AddErrorListener(new ThrowExceptionErrorListener());
// get the tree and and traverse
var tokenStream = new CommonTokenStream(lexer);
IParseTree tree = new RomanParser(tokenStream).root();
Console.WriteLine(tree.ToStringTree());
int actual = tree.Accept(new RomanDecodeVisitor());Now, you notice that
ErrorListener there? It's there to force the Antlr to throw an exception if a SyntaxError happens. The implementation is right here.class ThrowExceptionErrorListener : BaseErrorListener, IAntlrErrorListener
{
//BaseErrorListener implementation
public override void SyntaxError(IRecognizer recognizer, IToken offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
{
throw new InvalidOperationException("Syntax Error: " + msg, e);
}
//IAntlrErrorListener implementation
public void SyntaxError(IRecognizer recognizer, int offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
{
throw new InvalidOperationException("Syntax Error: " + msg, e);
}
}As for an actual review then... I dislike that each visitor method is a copy/paste of each other visitor method. I'd go one step further and create this method overload.
private int RomanSum(Antlr4.Runtime.ParserRuleContext context)
{
return RomanSum(from child in context.children select Visit(child));
}But at this point, it becomes obvious that we're just walking the tree, which makes me wonder if it wouldn't be best to just implement a
Listener and just walk the tree. Of course, if we use a Listener and a TreeWalker, then the custom Lexer code might as well go away and just have the Listener count the tokens as we walk.Code Snippets
public class DerivedRomanLexer : RomanLanguage.RomanLexer
{
public DerivedRomanLexer(ICharStream input)
:base(input)
{ }
private IToken _lastToken;
private int _lastTokenCount;
public override IToken NextToken()
{
var token = base.NextToken();
if (_lastToken != null)
{
if (_lastToken.Type == token.Type)
{
_lastTokenCount++;
}
else
{
_lastTokenCount = 0;
}
if (_lastTokenCount > 2)
{
foreach (BaseErrorListener listener in this.ErrorListeners)
{
listener.SyntaxError(this, token, token.Line, token.StartIndex, "Too many tokens. There can be a max of 3.", new RecognitionException(this, this._input));
}
}
}
_lastToken = token;
return token;
}
}// setup the lexer
var inputStream = new AntlrInputStream(text);
//var lexer = new RomanLexer(inputStream);
var lexer = new DerivedRomanLexer(inputStream);
lexer.RemoveErrorListeners();
lexer.AddErrorListener(new ThrowExceptionErrorListener());
// get the tree and and traverse
var tokenStream = new CommonTokenStream(lexer);
IParseTree tree = new RomanParser(tokenStream).root();
Console.WriteLine(tree.ToStringTree());
int actual = tree.Accept(new RomanDecodeVisitor());class ThrowExceptionErrorListener : BaseErrorListener, IAntlrErrorListener<int>
{
//BaseErrorListener implementation
public override void SyntaxError(IRecognizer recognizer, IToken offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
{
throw new InvalidOperationException("Syntax Error: " + msg, e);
}
//IAntlrErrorListener<int> implementation
public void SyntaxError(IRecognizer recognizer, int offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
{
throw new InvalidOperationException("Syntax Error: " + msg, e);
}
}private int RomanSum(Antlr4.Runtime.ParserRuleContext context)
{
return RomanSum(from child in context.children select Visit(child));
}Context
StackExchange Code Review Q#117711, answer score: 7
Revisions (0)
No revisions yet.