patterncsharpMinor
Tokenizer building blocks: tokens and spans
Viewed 0 times
blocksbuildingtokenizerandspanstokens
Problem
In a completely overkill BrainFuck lexer/parser I've presented the lexer, parser, interpreter and syntax tree classes. With this post I'd like to go over the lower-level
Each token has a
A
```
using System;
using System.Collections.Generic;
namespace BrainFuck.Tokens
{
///
/// A base class for all language tokens.
///
public abstract class Token : IEquatable, IComparable
{
private static readonly IDictionary Tokens =
new Dictionary
{
[TokenType.MoveLeft] = MoveLeftToken.Token,
[TokenType.MoveRight] = MoveRightToken.Token,
[TokenType.BeginLoop] = BeginLoopToken.Token,
[TokenType.EndLoop] = EndLoopToken.Token,
[TokenType.Increment] = IncrementToken.Token,
[TokenType.Decrement] = DecrementToken.Token,
[TokenType.Input] = InputToken.Token,
[TokenType.Output] = OutputToken.Token,
};
protected Token(Span position, int index, TokenType type)
: this(position, index, Tokens[type])
{
Type = type;
}
protected Token(Span position, int index, string text)
{
Index = index;
Type = TokenType.Trivia;
Position = position;
Text = text;
}
///
/// The type of token.
///
public Tok
Token and Span mechanics.Each token has a
Type property that returns a TokenType enum value:namespace BrainFuck.Tokens
{
public enum TokenType
{
Trivia,
MoveLeft,
MoveRight,
BeginLoop,
EndLoop,
Increment,
Decrement,
Input,
Output,
}
}A
Token represents one or more characters in the BrainFuck source code input; BF lexer makes the TriviaToken the only token that can actually span more than a single character, but all tokens have the possibility of being represented with multiple characters. Here's the Token class:```
using System;
using System.Collections.Generic;
namespace BrainFuck.Tokens
{
///
/// A base class for all language tokens.
///
public abstract class Token : IEquatable, IComparable
{
private static readonly IDictionary Tokens =
new Dictionary
{
[TokenType.MoveLeft] = MoveLeftToken.Token,
[TokenType.MoveRight] = MoveRightToken.Token,
[TokenType.BeginLoop] = BeginLoopToken.Token,
[TokenType.EndLoop] = EndLoopToken.Token,
[TokenType.Increment] = IncrementToken.Token,
[TokenType.Decrement] = DecrementToken.Token,
[TokenType.Input] = InputToken.Token,
[TokenType.Output] = OutputToken.Token,
};
protected Token(Span position, int index, TokenType type)
: this(position, index, Tokens[type])
{
Type = type;
}
protected Token(Span position, int index, string text)
{
Index = index;
Type = TokenType.Trivia;
Position = position;
Text = text;
}
///
/// The type of token.
///
public Tok
Solution
Let's take a look at three of your token classes:
Now let me ask you: Did you copy-paste any code while writing this? You did, didn't you?
I don't see a reason for why you need 8 (one for each BF instruction) different classes. How about 8 different objects instead? Or 8 different factory methods possibly. You are not using any OOP aspects for these token classes. So.... do they really deserve to be classes?
Forgive my Java, but may I suggest something like this instead?
As far as I can see, there is no functionality that you would lose out on if you would go this way instead.
///
/// A language token representing a "Move Left" instruction.
///
public sealed class MoveLeftToken : Token
{
public static string Token => "
/// A language token representing a "Move Right" instruction.
///
public sealed class MoveRightToken : Token
{
public static string Token => ">";
public MoveRightToken(Span position, int index) : base(position, index, TokenType.MoveRight) { }
}
///
/// A language token representing a "Begin Loop" instruction.
///
public sealed class BeginLoopToken : Token
{
public static string Token => "[";
public BeginLoopToken(Span position, int index) : base(position, index, TokenType.BeginLoop) { }
}Now let me ask you: Did you copy-paste any code while writing this? You did, didn't you?
I don't see a reason for why you need 8 (one for each BF instruction) different classes. How about 8 different objects instead? Or 8 different factory methods possibly. You are not using any OOP aspects for these token classes. So.... do they really deserve to be classes?
Forgive my Java, but may I suggest something like this instead?
public class Tokens {
public static final String TOKEN_INCREMENT = "+";
public static Token increment(Span position, int index) {
return new Token(position, index, TOKEN_INCREMENT);
}
}As far as I can see, there is no functionality that you would lose out on if you would go this way instead.
Code Snippets
/// <summary>
/// A language token representing a "Move Left" instruction.
/// </summary>
public sealed class MoveLeftToken : Token
{
public static string Token => "<";
public MoveLeftToken(Span position, int index) : base(position, index, TokenType.MoveLeft) { }
}
/// <summary>
/// A language token representing a "Move Right" instruction.
/// </summary>
public sealed class MoveRightToken : Token
{
public static string Token => ">";
public MoveRightToken(Span position, int index) : base(position, index, TokenType.MoveRight) { }
}
/// <summary>
/// A language token representing a "Begin Loop" instruction.
/// </summary>
public sealed class BeginLoopToken : Token
{
public static string Token => "[";
public BeginLoopToken(Span position, int index) : base(position, index, TokenType.BeginLoop) { }
}public class Tokens {
public static final String TOKEN_INCREMENT = "+";
public static Token increment(Span position, int index) {
return new Token(position, index, TOKEN_INCREMENT);
}
}Context
StackExchange Code Review Q#145110, answer score: 4
Revisions (0)
No revisions yet.