HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Implementing a fast 'split' function on a stream of characters in Java

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
faststreamfunctionjavasplitcharactersimplementing

Problem

I was writing a piece of code (a custom EntityProcessor for Solr, but that isn't too relevant) designed to break lines of input based on a string delimiter (toMatch below).

Characters are read from a stream and then passed to addChar.

The assumption is that each character comes in a stream and look-ahead isn't allowed - the break needs to happen after the last character of a match is passed in.

I wrote two implementations below addChar and addChar2.

The first is a simple, dumb, keep a buffer and compare it each iteration, the second is a simple state machine and the latter is slightly faster.

import java.util.Iterator;
import java.util.LinkedList;

public class CharStreamMatcher {
    protected char[] toMatch = null;

    public MagicCharStreamMatcher(char[] match){
        toMatch = match;
        for(int i=0;i li = new LinkedList();
    public boolean addChar2(char c){
        li.addLast(c);
        li.removeFirst();
        int i=0;
        for(Character ch : li){
            if(ch.charValue() != toMatch[i++])
                return false;
        }
        return true;
    }

    private class MutableInteger{
        public int i;
        public MutableInteger(int i){
            this.i = i;
        }
    }

    LinkedList correspondences = new LinkedList();
    public boolean addChar(char c){
        boolean result = false; 

        if(c == toMatch[0])
            correspondences.add(new MutableInteger(-1));

        Iterator it = correspondences.iterator();
        while(it.hasNext()){
            MutableInteger mi = it.next();
            mi.i++;

            // check the match
            if(c != toMatch[mi.i]){
                it.remove();
            }
            // are we done? 
            else if(mi.i == toMatch.length-1){
                result = true;
                it.remove();
            }
        }

        return result;
    }   
}

Solution

There are more-sophisticated algorithms you can use, depending on the input and the search string. Even keeping the "one character at a time" interface though, there are optimizations to the implementation. Someone linked to KMP in a comment to your question - that's a good way to go if you can do some initial setup and buffer an arbitrary amount of the incoming stream.

I think you're on the right track with a state machine, but the implementation in addChar2 looks kind of heavyweight, with the MutableInteger class and a linked list, and an iterator-driven loop, and all. If you want this to be fast, you'll likely want to use primitive types (ints and arrays of ints) rather than OO wrappers around the primitive types.

Context

StackExchange Code Review Q#827, answer score: 4

Revisions (0)

No revisions yet.