patternjavaMinor
TrieSet with Wildcard Search
Viewed 0 times
withwildcardtriesetsearch
Problem
What I'm ultimately trying to accomplish is a tool in which I pour some documents. Then I mark #x documents as desired and #y documents as undesired.
My tool now is to analyse and compare the documents and give me search strings that would find me many of the desired and few of the undesired.
What the code at hand is supposed to do:
Limitations:
What I would like to know:
The
```
package linguist.model.datastructure;
import java.util.ArrayList;
import java.util.HashSet;
public class TrieSet {
private final Node root;
private int countOfUniqueKeys = 0; //The number of unique entries
private int countOfAdds = 0; //The number of entries
private void countUniques(){
countOfUniqueKeys+=1;
}
private void countAdds(){
countOfAdds+=1;
}
TrieSet(){
root=new Node("",this);
}
/**
* @param newEntry Word that is added
*/
public void add(String newEntry){
newEntry=ParseStrings.normalizeAddString(newEntry);
root.add(newEntry,0);
}
/**
* @param searchstring The searchstring
* @return A HashSet of hit nodes as result of the search operation
*/
public HashSet getNodes(String searchstring){
searchstring=ParseStrings.normalizeLookupString(searchstring);
return root.getNodes(searchstring, 0);
}
/**
* @author S761
*
*/
private final class Node{
My tool now is to analyse and compare the documents and give me search strings that would find me many of the desired and few of the undesired.
What the code at hand is supposed to do:
- Save words in a
TrieSet
- Conduct wildcard searches on that
TrieSet
Limitations:
- Is not supposed to work with entries containing [^a-Z]
What I would like to know:
- What should I have done better? What could I have done worse?
- How do you estimate/compare my codes performance?
- What about interfaces like collection, iterable, serializable? Should I have implemented those?
- Are there sections that can lead to runtime errors?
- How could I have distributed the code better/over more classes.
- Anything else I did not even think of right now.
The
TrieSet (including a main used for some testing):```
package linguist.model.datastructure;
import java.util.ArrayList;
import java.util.HashSet;
public class TrieSet {
private final Node root;
private int countOfUniqueKeys = 0; //The number of unique entries
private int countOfAdds = 0; //The number of entries
private void countUniques(){
countOfUniqueKeys+=1;
}
private void countAdds(){
countOfAdds+=1;
}
TrieSet(){
root=new Node("",this);
}
/**
* @param newEntry Word that is added
*/
public void add(String newEntry){
newEntry=ParseStrings.normalizeAddString(newEntry);
root.add(newEntry,0);
}
/**
* @param searchstring The searchstring
* @return A HashSet of hit nodes as result of the search operation
*/
public HashSet getNodes(String searchstring){
searchstring=ParseStrings.normalizeLookupString(searchstring);
return root.getNodes(searchstring, 0);
}
/**
* @author S761
*
*/
private final class Node{
Solution
I can give you some feedback on your regexes. There are a number of times in Java regex, where you cannot avoid using the "evil escaped escape", but you can refactor most of them out in your case.
First of all, you don't need any backslashes in these three:
You are inside a
Personally, I would try to avoid some of these backslashes:
You could just use a char class, which I think is more readable:
The only other comment I have is about, well, your comments.
I notice that you have some comments in English, and some comments like this one:
You should probably stick to one language. (Although I suspect you may have translated all the comments, but just forgot this one.)
The other thing I suggest is that you remove some of the commented out code, and also remove some of the excessive blank lines (like the ones towards the end of the first class).
First of all, you don't need any backslashes in these three:
string=string.replaceAll("[^a-z0-9\\{\\}\\*\\?]", "");
string=string.replaceAll("[\\{\\}]", "");
string=string.replaceAll("[\\*#]+", "");You are inside a
[] char class, so most of the normal meta characters loose their meaning. You could use:string=string.replaceAll("[^a-z0-9{}*?]", "");
string=string.replaceAll("[{}]", "");
string=string.replaceAll("[*#]+", "");Personally, I would try to avoid some of these backslashes:
string=string.replaceAll("(\\*\\d*)*\\*(?!\\d)(\\*\\d*)*", "*");You could just use a char class, which I think is more readable:
string=string.replaceAll("([*]\\d*)*[*](?!\\d)([*]\\d*)*", "*");The only other comment I have is about, well, your comments.
I notice that you have some comments in English, and some comments like this one:
//überflüssige '*' zu einem '*' zusammenführenYou should probably stick to one language. (Although I suspect you may have translated all the comments, but just forgot this one.)
The other thing I suggest is that you remove some of the commented out code, and also remove some of the excessive blank lines (like the ones towards the end of the first class).
Code Snippets
string=string.replaceAll("[^a-z0-9\\{\\}\\*\\?]", "");
string=string.replaceAll("[\\{\\}]", "");
string=string.replaceAll("[\\*#]+", "");string=string.replaceAll("[^a-z0-9{}*?]", "");
string=string.replaceAll("[{}]", "");
string=string.replaceAll("[*#]+", "");string=string.replaceAll("(\\*\\d*)*\\*(?!\\d)(\\*\\d*)*", "*");string=string.replaceAll("([*]\\d*)*[*](?!\\d)([*]\\d*)*", "*");//überflüssige '*' zu einem '*' zusammenführenContext
StackExchange Code Review Q#119542, answer score: 3
Revisions (0)
No revisions yet.