HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Tokenizer building blocks: tokens and spans

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
blocksbuildingtokenizerandspanstokens

Problem

In a completely overkill BrainFuck lexer/parser I've presented the lexer, parser, interpreter and syntax tree classes. With this post I'd like to go over the lower-level Token and Span mechanics.

Each token has a Type property that returns a TokenType enum value:

namespace BrainFuck.Tokens
{
    public enum TokenType
    {
        Trivia,
        MoveLeft,
        MoveRight,
        BeginLoop,
        EndLoop,
        Increment,
        Decrement,
        Input,
        Output,
    }
}


A Token represents one or more characters in the BrainFuck source code input; BF lexer makes the TriviaToken the only token that can actually span more than a single character, but all tokens have the possibility of being represented with multiple characters. Here's the Token class:

```
using System;
using System.Collections.Generic;

namespace BrainFuck.Tokens
{
///
/// A base class for all language tokens.
///
public abstract class Token : IEquatable, IComparable
{
private static readonly IDictionary Tokens =
new Dictionary
{
[TokenType.MoveLeft] = MoveLeftToken.Token,
[TokenType.MoveRight] = MoveRightToken.Token,
[TokenType.BeginLoop] = BeginLoopToken.Token,
[TokenType.EndLoop] = EndLoopToken.Token,
[TokenType.Increment] = IncrementToken.Token,
[TokenType.Decrement] = DecrementToken.Token,
[TokenType.Input] = InputToken.Token,
[TokenType.Output] = OutputToken.Token,
};

protected Token(Span position, int index, TokenType type)
: this(position, index, Tokens[type])
{
Type = type;
}

protected Token(Span position, int index, string text)
{
Index = index;
Type = TokenType.Trivia;
Position = position;
Text = text;
}

///
/// The type of token.
///
public Tok

Solution

Let's take a look at three of your token classes:

/// 
/// A language token representing a "Move Left" instruction.
/// 
public sealed class MoveLeftToken : Token
{
    public static string Token => "
/// A language token representing a "Move Right" instruction.
/// 
public sealed class MoveRightToken : Token
{
    public static string Token => ">";
    public MoveRightToken(Span position, int index) : base(position, index, TokenType.MoveRight) { }
}

/// 
/// A language token representing a "Begin Loop" instruction.
/// 
public sealed class BeginLoopToken : Token
{
    public static string Token => "[";
    public BeginLoopToken(Span position, int index) : base(position, index, TokenType.BeginLoop) { }
}


Now let me ask you: Did you copy-paste any code while writing this? You did, didn't you?

I don't see a reason for why you need 8 (one for each BF instruction) different classes. How about 8 different objects instead? Or 8 different factory methods possibly. You are not using any OOP aspects for these token classes. So.... do they really deserve to be classes?

Forgive my Java, but may I suggest something like this instead?

public class Tokens {
    public static final String TOKEN_INCREMENT = "+";

    public static Token increment(Span position, int index) {
        return new Token(position, index, TOKEN_INCREMENT);
    }
}


As far as I can see, there is no functionality that you would lose out on if you would go this way instead.

Code Snippets

/// <summary>
/// A language token representing a "Move Left" instruction.
/// </summary>
public sealed class MoveLeftToken : Token
{
    public static string Token => "<";
    public MoveLeftToken(Span position, int index) : base(position, index, TokenType.MoveLeft) { }
}

/// <summary>
/// A language token representing a "Move Right" instruction.
/// </summary>
public sealed class MoveRightToken : Token
{
    public static string Token => ">";
    public MoveRightToken(Span position, int index) : base(position, index, TokenType.MoveRight) { }
}

/// <summary>
/// A language token representing a "Begin Loop" instruction.
/// </summary>
public sealed class BeginLoopToken : Token
{
    public static string Token => "[";
    public BeginLoopToken(Span position, int index) : base(position, index, TokenType.BeginLoop) { }
}
public class Tokens {
    public static final String TOKEN_INCREMENT = "+";

    public static Token increment(Span position, int index) {
        return new Token(position, index, TOKEN_INCREMENT);
    }
}

Context

StackExchange Code Review Q#145110, answer score: 4

Revisions (0)

No revisions yet.