patternjavaMinor

Open Source Tokenizer

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

sourceopentokenizer

Problem

I'm developing an open source annotation processor for Android (ngAndroid). Part of the library is a small Java subset compiled language that can be used for data binding, events, etc. I plan to extend the functionality of this language but my Tokenizer is getting out of hand.

The Tokenizer parses strings such as getTimeString(note.time),
modelName.boolValue ? functionName(m.parameter , q.secondParameter ) : modelName.stringValue, (3 + (2)) - 10/5 etc.

```
public class Tokenizer {

private enum State{
BEGIN,
END,
MODEL_PERIOD,
FUNCTION_PARAMETER_START,
CLOSE_PARENTHESIS,
TERNARY_QUESTION,
TERNARY_COLON,
STRING_START,
STRING_END,
IN_STRING,
IN_NUMBER_CONSTANT,
KNOT_EQUALS,
EQUALS_START,
FUNCTION_PARAMETER_DELIMINATOR,
EQUALS,
KNOT_EQUALS_START,
MODEL_NAME_END,
MODEL_FIELD_END,
FUNCTION_NAME_END,
NUMBER_CONSTANT_END,
IN_CHAR_SEQUENCE,
IN_MODEL_FIELD,
KNOT_VALUE,
OPERATOR,
IN_FLOAT,
FLOAT_END,
DOUBLE_END,
INTEGER_END,
LONG_END,
NESTED_EXPRESSION,
IN_STRING_SLASH,
STRING_SLASH_END,
WHITESPACE, FLOAT_F_END
}

private int index, readIndex;
private String script;
private Queue tokens;
private State state;

public Tokenizer(String script) {
this.script = script;
}

public Queue getTokens() {
if (tokens == null) {
generateTokens();
}
return tokens;
}

private void generateTokens() {
tokens = new LinkedList();
index = 0;
readIndex = 0;

state = State.BEGIN;
while (state != State.END) {
state = nextState();
}
if (readIndex != script.length()) {
emit(TokenType.RUBBISH);
}

tokens.add(new Token(TokenType.EOF, null));
}

Solution

Instead of this:

c != 'l' && c != 'L' && c != 'f' && c != 'F' && c != 'd' && c != 'D'

I would write:

"lLfFdD".indexOf(c) == -1

In the many case statements, instead of storing the result in the result variable, and then returning it in the end, how about returning earlier, as soon as the value to return is ready? This way the code can become shorter, as you won't need the break statements anymore.

It's recommended to use braces with even single line if statements.

Code Snippets

c != 'l' && c != 'L' && c != 'f' && c != 'F' && c != 'd' && c != 'D'

"lLfFdD".indexOf(c) == -1

Context

StackExchange Code Review Q#93740, answer score: 4

Revisions (0)

No revisions yet.