patternjavaMinor
Implementing a fast 'split' function on a stream of characters in Java
Viewed 0 times
faststreamfunctionjavasplitcharactersimplementing
Problem
I was writing a piece of code (a custom
Characters are read from a stream and then passed to
The assumption is that each character comes in a stream and look-ahead isn't allowed - the break needs to happen after the last character of a match is passed in.
I wrote two implementations below
The first is a simple, dumb, keep a buffer and compare it each iteration, the second is a simple state machine and the latter is slightly faster.
EntityProcessor for Solr, but that isn't too relevant) designed to break lines of input based on a string delimiter (toMatch below).Characters are read from a stream and then passed to
addChar.The assumption is that each character comes in a stream and look-ahead isn't allowed - the break needs to happen after the last character of a match is passed in.
I wrote two implementations below
addChar and addChar2.The first is a simple, dumb, keep a buffer and compare it each iteration, the second is a simple state machine and the latter is slightly faster.
import java.util.Iterator;
import java.util.LinkedList;
public class CharStreamMatcher {
protected char[] toMatch = null;
public MagicCharStreamMatcher(char[] match){
toMatch = match;
for(int i=0;i li = new LinkedList();
public boolean addChar2(char c){
li.addLast(c);
li.removeFirst();
int i=0;
for(Character ch : li){
if(ch.charValue() != toMatch[i++])
return false;
}
return true;
}
private class MutableInteger{
public int i;
public MutableInteger(int i){
this.i = i;
}
}
LinkedList correspondences = new LinkedList();
public boolean addChar(char c){
boolean result = false;
if(c == toMatch[0])
correspondences.add(new MutableInteger(-1));
Iterator it = correspondences.iterator();
while(it.hasNext()){
MutableInteger mi = it.next();
mi.i++;
// check the match
if(c != toMatch[mi.i]){
it.remove();
}
// are we done?
else if(mi.i == toMatch.length-1){
result = true;
it.remove();
}
}
return result;
}
}Solution
There are more-sophisticated algorithms you can use, depending on the input and the search string. Even keeping the "one character at a time" interface though, there are optimizations to the implementation. Someone linked to KMP in a comment to your question - that's a good way to go if you can do some initial setup and buffer an arbitrary amount of the incoming stream.
I think you're on the right track with a state machine, but the implementation in addChar2 looks kind of heavyweight, with the MutableInteger class and a linked list, and an iterator-driven loop, and all. If you want this to be fast, you'll likely want to use primitive types (ints and arrays of ints) rather than OO wrappers around the primitive types.
I think you're on the right track with a state machine, but the implementation in addChar2 looks kind of heavyweight, with the MutableInteger class and a linked list, and an iterator-driven loop, and all. If you want this to be fast, you'll likely want to use primitive types (ints and arrays of ints) rather than OO wrappers around the primitive types.
Context
StackExchange Code Review Q#827, answer score: 4
Revisions (0)
No revisions yet.