patternjavaMinor

TrieSet with Wildcard Search

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

withwildcardtriesetsearch

Problem

What I'm ultimately trying to accomplish is a tool in which I pour some documents. Then I mark #x documents as desired and #y documents as undesired.
My tool now is to analyse and compare the documents and give me search strings that would find me many of the desired and few of the undesired.

What the code at hand is supposed to do:

Save words in a TrieSet

Conduct wildcard searches on that TrieSet

Limitations:

Is not supposed to work with entries containing [^a-Z]

What I would like to know:

What should I have done better? What could I have done worse?

How do you estimate/compare my codes performance?

What about interfaces like collection, iterable, serializable? Should I have implemented those?

Are there sections that can lead to runtime errors?

How could I have distributed the code better/over more classes.

Anything else I did not even think of right now.

The TrieSet (including a main used for some testing):

```
package linguist.model.datastructure;

import java.util.ArrayList;
import java.util.HashSet;

public class TrieSet {
private final Node root;
private int countOfUniqueKeys = 0; //The number of unique entries
private int countOfAdds = 0; //The number of entries

private void countUniques(){
countOfUniqueKeys+=1;
}

private void countAdds(){
countOfAdds+=1;
}

TrieSet(){
root=new Node("",this);
}

/**
* @param newEntry Word that is added
*/
public void add(String newEntry){
newEntry=ParseStrings.normalizeAddString(newEntry);
root.add(newEntry,0);
}

/**
* @param searchstring The searchstring
* @return A HashSet of hit nodes as result of the search operation
*/
public HashSet getNodes(String searchstring){
searchstring=ParseStrings.normalizeLookupString(searchstring);
return root.getNodes(searchstring, 0);
}

/**
* @author S761
*
*/
private final class Node{

Solution

I can give you some feedback on your regexes. There are a number of times in Java regex, where you cannot avoid using the "evil escaped escape", but you can refactor most of them out in your case.

First of all, you don't need any backslashes in these three:

string=string.replaceAll("[^a-z0-9\\{\\}\\*\\?]", "");
string=string.replaceAll("[\\{\\}]", "");
string=string.replaceAll("[\\*#]+", "");

You are inside a [] char class, so most of the normal meta characters loose their meaning. You could use:

string=string.replaceAll("[^a-z0-9{}*?]", "");
string=string.replaceAll("[{}]", "");
string=string.replaceAll("[*#]+", "");

Personally, I would try to avoid some of these backslashes:

string=string.replaceAll("(\\*\\d*)*\\*(?!\\d)(\\*\\d*)*", "*");

You could just use a char class, which I think is more readable:

string=string.replaceAll("([*]\\d*)*[*](?!\\d)([*]\\d*)*", "*");

The only other comment I have is about, well, your comments.

I notice that you have some comments in English, and some comments like this one:

//überflüssige '*' zu einem '*' zusammenführen

You should probably stick to one language. (Although I suspect you may have translated all the comments, but just forgot this one.)

The other thing I suggest is that you remove some of the commented out code, and also remove some of the excessive blank lines (like the ones towards the end of the first class).

Code Snippets

string=string.replaceAll("[^a-z0-9\\{\\}\\*\\?]", "");
string=string.replaceAll("[\\{\\}]", "");
string=string.replaceAll("[\\*#]+", "");

string=string.replaceAll("[^a-z0-9{}*?]", "");
string=string.replaceAll("[{}]", "");
string=string.replaceAll("[*#]+", "");

string=string.replaceAll("(\\*\\d*)*\\*(?!\\d)(\\*\\d*)*", "*");

string=string.replaceAll("([*]\\d*)*[*](?!\\d)([*]\\d*)*", "*");

//überflüssige '*' zu einem '*' zusammenführen

Context

StackExchange Code Review Q#119542, answer score: 3

Revisions (0)

No revisions yet.