patterncsharpMinor

Roman numerals with ANTLR

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

withromanantlrnumerals

Problem

I've written a simple interpreter with ANTLR for evaluating Roman numerals. Here's the contents of the grammar file (Roman.g4):

grammar Roman;

root  : (oneThousand)* hundreds? tens? units?;

// --- I, II, III, IV, IX or V VI, VII, VIII
units : one ((one)* | five  | ten) | five (one)*; 

// --- X, XX, XXX, XL, XC or L, LX, LXX, LXXX
tens  : ten ((ten)* | fifty | oneHundred) | fifty (ten)*;

// --- C, CC, CCC, CD, CM or D, DC, DCC, DCCC 
hundreds : oneHundred ((oneHundred)* | fiveHundred | oneThousand) | fiveHundred (oneHundred)*; 

// --- atomic definitions
one         : 'I';
five        : 'V';
ten         : 'X';
fifty       : 'L';
oneHundred  : 'C';
fiveHundred : 'D';
oneThousand : 'M';

// --- skip over white spaces, tabs, newlines
WS : [ \t\r\n]+ -> skip ;

The target language I'm using is C# and my interpreter is implemented as a visitor. To compile the ANTLR parser/lexer, first create the directory Antlr at the same level as the grammar file, then run antlr4 Roman.g4 -Dlanguage=CSharp_v4_0 -visitor -o Antlr (or something equivalent if antlr4 isn't an alias). You should include the Antlr directory and the generated files in your project.

Here's my visitor implementation:

```
public class RomanDecodeVisitor : AbstractParseTreeVisitor, IRomanVisitor
{
public int VisitRoot(RomanParser.RootContext context)
{
return RomanSum(from child in context.children select Visit(child));
}

public int VisitHundreds(RomanParser.HundredsContext context)
{
return RomanSum(from child in context.children select Visit(child));
}

public int VisitTens(RomanParser.TensContext context)
{
return RomanSum(from child in context.children select Visit(child));
}

public int VisitUnits(RomanParser.UnitsContext context)
{
return RomanSum(from child in context.children select Visit(child));
}

public int VisitOne(RomanParser.OneContext context)
{
return 1;
}

public int Vis

Solution

I'm not sure this counts as a Code Review answer, but I wanted to at least share what I found out after nearly a day of fighting with this...

As I think you know, your grammar allows strings like "IIII" and "XXXX".
This is definitively a bug, and a damn hard one to squish. Part of the problem is that Roman Numerals are not a context free language. (i.e. We want to add in this case, but subtract in that case.)

I think you were right about counting the tokens, but I took an approach of doing it in the lexer, which I'm not sure was a good idea. It might have been easier to do at the Parsing stage, but it seemed to make sense to me that this be a Syntax error. Anyway. Here's what I did.

I derived a subclass from the Antlr generated RomanLexer. This class counts the number of consecutive tokens and alerts any ErrorListeners that there was a SyntaxError.

public class DerivedRomanLexer : RomanLanguage.RomanLexer
{
    public DerivedRomanLexer(ICharStream input) 
        :base(input)
    { }

    private IToken _lastToken;
    private int _lastTokenCount;

    public override IToken NextToken()
    {
        var token = base.NextToken();

        if (_lastToken != null)
        {
            if (_lastToken.Type == token.Type)
            {
                _lastTokenCount++;
            }
            else
            {
                _lastTokenCount = 0;
            }

            if (_lastTokenCount > 2)
            {
                foreach (BaseErrorListener listener in this.ErrorListeners)
                {
                    listener.SyntaxError(this, token, token.Line, token.StartIndex, "Too many tokens. There can be a max of 3.", new RecognitionException(this, this._input));
                }
            }
        }

        _lastToken = token;
        return token;
    }
}

Obviously, we'll need to change your test class to use this customized Lexer instead.

// setup the lexer
        var inputStream = new AntlrInputStream(text);

        //var lexer = new RomanLexer(inputStream);
        var lexer = new DerivedRomanLexer(inputStream);
        lexer.RemoveErrorListeners();
        lexer.AddErrorListener(new ThrowExceptionErrorListener());

        // get the tree and and traverse
        var tokenStream = new CommonTokenStream(lexer);

        IParseTree tree = new RomanParser(tokenStream).root();

        Console.WriteLine(tree.ToStringTree());

        int actual = tree.Accept(new RomanDecodeVisitor());

Now, you notice that ErrorListener there? It's there to force the Antlr to throw an exception if a SyntaxError happens. The implementation is right here.

class ThrowExceptionErrorListener : BaseErrorListener, IAntlrErrorListener
{
    //BaseErrorListener implementation
    public override void SyntaxError(IRecognizer recognizer, IToken offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
    {
        throw new InvalidOperationException("Syntax Error: " + msg, e);
    }

    //IAntlrErrorListener implementation
    public void SyntaxError(IRecognizer recognizer, int offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
    {
        throw new InvalidOperationException("Syntax Error: " + msg, e);
    }
}

As for an actual review then... I dislike that each visitor method is a copy/paste of each other visitor method. I'd go one step further and create this method overload.

private int RomanSum(Antlr4.Runtime.ParserRuleContext context)
{
    return RomanSum(from child in context.children select Visit(child));
}

But at this point, it becomes obvious that we're just walking the tree, which makes me wonder if it wouldn't be best to just implement a Listener and just walk the tree. Of course, if we use a Listener and a TreeWalker, then the custom Lexer code might as well go away and just have the Listener count the tokens as we walk.

Code Snippets

public class DerivedRomanLexer : RomanLanguage.RomanLexer
{
    public DerivedRomanLexer(ICharStream input) 
        :base(input)
    { }

    private IToken _lastToken;
    private int _lastTokenCount;

    public override IToken NextToken()
    {
        var token = base.NextToken();

        if (_lastToken != null)
        {
            if (_lastToken.Type == token.Type)
            {
                _lastTokenCount++;
            }
            else
            {
                _lastTokenCount = 0;
            }

            if (_lastTokenCount > 2)
            {
                foreach (BaseErrorListener listener in this.ErrorListeners)
                {
                    listener.SyntaxError(this, token, token.Line, token.StartIndex, "Too many tokens. There can be a max of 3.", new RecognitionException(this, this._input));
                }
            }
        }

        _lastToken = token;
        return token;
    }
}

// setup the lexer
        var inputStream = new AntlrInputStream(text);

        //var lexer = new RomanLexer(inputStream);
        var lexer = new DerivedRomanLexer(inputStream);
        lexer.RemoveErrorListeners();
        lexer.AddErrorListener(new ThrowExceptionErrorListener());

        // get the tree and and traverse
        var tokenStream = new CommonTokenStream(lexer);

        IParseTree tree = new RomanParser(tokenStream).root();

        Console.WriteLine(tree.ToStringTree());

        int actual = tree.Accept(new RomanDecodeVisitor());

class ThrowExceptionErrorListener : BaseErrorListener, IAntlrErrorListener<int>
{
    //BaseErrorListener implementation
    public override void SyntaxError(IRecognizer recognizer, IToken offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
    {
        throw new InvalidOperationException("Syntax Error: " + msg, e);
    }

    //IAntlrErrorListener<int> implementation
    public void SyntaxError(IRecognizer recognizer, int offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
    {
        throw new InvalidOperationException("Syntax Error: " + msg, e);
    }
}

private int RomanSum(Antlr4.Runtime.ParserRuleContext context)
{
    return RomanSum(from child in context.children select Visit(child));
}

Context

StackExchange Code Review Q#117711, answer score: 7

Revisions (0)

No revisions yet.