patternjavaMinor
Open Source Tokenizer
Viewed 0 times
sourceopentokenizer
Problem
I'm developing an open source annotation processor for Android (ngAndroid). Part of the library is a small Java subset compiled language that can be used for data binding, events, etc. I plan to extend the functionality of this language but my Tokenizer is getting out of hand.
The Tokenizer parses strings such as
```
public class Tokenizer {
private enum State{
BEGIN,
END,
MODEL_PERIOD,
FUNCTION_PARAMETER_START,
CLOSE_PARENTHESIS,
TERNARY_QUESTION,
TERNARY_COLON,
STRING_START,
STRING_END,
IN_STRING,
IN_NUMBER_CONSTANT,
KNOT_EQUALS,
EQUALS_START,
FUNCTION_PARAMETER_DELIMINATOR,
EQUALS,
KNOT_EQUALS_START,
MODEL_NAME_END,
MODEL_FIELD_END,
FUNCTION_NAME_END,
NUMBER_CONSTANT_END,
IN_CHAR_SEQUENCE,
IN_MODEL_FIELD,
KNOT_VALUE,
OPERATOR,
IN_FLOAT,
FLOAT_END,
DOUBLE_END,
INTEGER_END,
LONG_END,
NESTED_EXPRESSION,
IN_STRING_SLASH,
STRING_SLASH_END,
WHITESPACE, FLOAT_F_END
}
private int index, readIndex;
private String script;
private Queue tokens;
private State state;
public Tokenizer(String script) {
this.script = script;
}
public Queue getTokens() {
if (tokens == null) {
generateTokens();
}
return tokens;
}
private void generateTokens() {
tokens = new LinkedList();
index = 0;
readIndex = 0;
state = State.BEGIN;
while (state != State.END) {
state = nextState();
}
if (readIndex != script.length()) {
emit(TokenType.RUBBISH);
}
tokens.add(new Token(TokenType.EOF, null));
}
The Tokenizer parses strings such as
getTimeString(note.time), modelName.boolValue ? functionName(m.parameter , q.secondParameter ) : modelName.stringValue, (3 + (2)) - 10/5 etc. ```
public class Tokenizer {
private enum State{
BEGIN,
END,
MODEL_PERIOD,
FUNCTION_PARAMETER_START,
CLOSE_PARENTHESIS,
TERNARY_QUESTION,
TERNARY_COLON,
STRING_START,
STRING_END,
IN_STRING,
IN_NUMBER_CONSTANT,
KNOT_EQUALS,
EQUALS_START,
FUNCTION_PARAMETER_DELIMINATOR,
EQUALS,
KNOT_EQUALS_START,
MODEL_NAME_END,
MODEL_FIELD_END,
FUNCTION_NAME_END,
NUMBER_CONSTANT_END,
IN_CHAR_SEQUENCE,
IN_MODEL_FIELD,
KNOT_VALUE,
OPERATOR,
IN_FLOAT,
FLOAT_END,
DOUBLE_END,
INTEGER_END,
LONG_END,
NESTED_EXPRESSION,
IN_STRING_SLASH,
STRING_SLASH_END,
WHITESPACE, FLOAT_F_END
}
private int index, readIndex;
private String script;
private Queue tokens;
private State state;
public Tokenizer(String script) {
this.script = script;
}
public Queue getTokens() {
if (tokens == null) {
generateTokens();
}
return tokens;
}
private void generateTokens() {
tokens = new LinkedList();
index = 0;
readIndex = 0;
state = State.BEGIN;
while (state != State.END) {
state = nextState();
}
if (readIndex != script.length()) {
emit(TokenType.RUBBISH);
}
tokens.add(new Token(TokenType.EOF, null));
}
Solution
Instead of this:
I would write:
In the many
It's recommended to use braces with even single line
c != 'l' && c != 'L' && c != 'f' && c != 'F' && c != 'd' && c != 'D'I would write:
"lLfFdD".indexOf(c) == -1In the many
case statements, instead of storing the result in the result variable, and then returning it in the end, how about returning earlier, as soon as the value to return is ready? This way the code can become shorter, as you won't need the break statements anymore.It's recommended to use braces with even single line
if statements.Code Snippets
c != 'l' && c != 'L' && c != 'f' && c != 'F' && c != 'd' && c != 'D'"lLfFdD".indexOf(c) == -1Context
StackExchange Code Review Q#93740, answer score: 4
Revisions (0)
No revisions yet.