patternjavaMinor
Check whether a list of list of string fits a regex efficiently
Viewed 0 times
efficientlyregexfitslistcheckwhetherstring
Problem
I have the following structure for my object:
Word (Object)
Symbol (Object)
SymbolChoice (Object)
The background of this is that this is processed OCR output and I want to force a word to match a certain regex. If it cannot find a match, it should return an empty
Example:
When forcing this word to abide regex
I have a solution, however it is not fast enough, more details after the method:
```
public List> forceFitRegex(final String regex) {
Objects.requireNonNull(regex, "regex");
if (getSymbols().size() >();
}
int combinations = 1;
for (Symbol symbol : getSymbols()) {
combinations *= symbol.getSymbolChoices().size();
}
System.out.println("Combinations: " + combinations);
List> listOfSymbolChoices = new ArrayList>();
for (SymbolChoice symbolChoice : getSymbols().get(0).getSymbolChoices()) {
List symbolChoices = new ArrayList();
symbolChoices.add(symbolChoice);
listOfSymbolChoices.add(symbolChoices);
}
int i = 1;
while (i > listOfSymbolChoicesCopy = new ArrayList>(listOfSymbolChoices);
for (SymbolChoice symbolChoice : getSymbols().get(i).getSymbolChoices()) {
for (List symbolChoicesCopy : listOfSymbolChoicesCopy) {
List symbolChoices = new ArrayList(symbolChoicesCopy);
symbolChoices.add(symbolChoice);
listOfSymbolChoices.add(symbolChoices);
}
}
i++;
}
Pattern pattern = Pattern.compile(regex);
ListIterator> symbolChoicesListIterator = listOfSymbolChoices.listIterator
Word (Object)
Word(String)
List
Symbol (Object)
Symbol(String)
List
SymbolChoice (Object)
Symbol(String)
The background of this is that this is processed OCR output and I want to force a word to match a certain regex. If it cannot find a match, it should return an empty
List, otherwise it should return all List that match the specified regex.Example:
- Found word:
"1o0"
- Symbols:
"1","o"and"0"
- Choices of
"1":"1"
- Choices of
"o":"o","0"
- Choices of
"0":"0"
When forcing this word to abide regex
"\\d{3}", it has to return [["1", "0", "0"]], here I am denoting the SymbolChoice objects as strings.I have a solution, however it is not fast enough, more details after the method:
```
public List> forceFitRegex(final String regex) {
Objects.requireNonNull(regex, "regex");
if (getSymbols().size() >();
}
int combinations = 1;
for (Symbol symbol : getSymbols()) {
combinations *= symbol.getSymbolChoices().size();
}
System.out.println("Combinations: " + combinations);
List> listOfSymbolChoices = new ArrayList>();
for (SymbolChoice symbolChoice : getSymbols().get(0).getSymbolChoices()) {
List symbolChoices = new ArrayList();
symbolChoices.add(symbolChoice);
listOfSymbolChoices.add(symbolChoices);
}
int i = 1;
while (i > listOfSymbolChoicesCopy = new ArrayList>(listOfSymbolChoices);
for (SymbolChoice symbolChoice : getSymbols().get(i).getSymbolChoices()) {
for (List symbolChoicesCopy : listOfSymbolChoicesCopy) {
List symbolChoices = new ArrayList(symbolChoicesCopy);
symbolChoices.add(symbolChoice);
listOfSymbolChoices.add(symbolChoices);
}
}
i++;
}
Pattern pattern = Pattern.compile(regex);
ListIterator> symbolChoicesListIterator = listOfSymbolChoices.listIterator
Solution
Bonus points if someone can calculate the asymptotic time my algorithm takes.
Too much. It's exponential, as you're generating the whole candidate set. Even the improvement by rolfl doesn't change it.
You could do much better by analyzing the
There's a method which might do this for you:
Returns true if the end of input was hit by the search engine in the last match operation performed by this matcher.
When this method returns true, then it is possible that more input would have changed the result of the last search.
So you could test you incomplete candidates against the
Too much. It's exponential, as you're generating the whole candidate set. Even the improvement by rolfl doesn't change it.
You could do much better by analyzing the
regex. For example "\\d{3}" will never match anything starting with a letter and you could cut off many possibilities. However, doing this in general is pretty complicated.There's a method which might do this for you:
hitEnd. I'm not sure if it does the right thing, but I guess so:Returns true if the end of input was hit by the search engine in the last match operation performed by this matcher.
When this method returns true, then it is possible that more input would have changed the result of the last search.
So you could test you incomplete candidates against the
regex and possibly save yourself the work with expanding them.Context
StackExchange Code Review Q#53981, answer score: 3
Revisions (0)
No revisions yet.