HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Trie implementation for left to right wildcard search

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
leftwildcardsearchfortrieimplementationright

Problem

I was working on a task to implement auto suggest from an array of couple of thousands of strings.

Of course, the first and the easiest implementable was to go through each element and do a[i].startWith(query). Technically the time complexity of this algorithm then would be \$O(n+(q*m))\$. Here, n equals the number of elements in the array, q equals the number of total matches and m equals the number of characters in the query (I am not sure how accurate I am here).

Regardless, I have created a following Trie which at every node keeps track of every child that has a word ending there.

For example, if the query is "can" then with only three traversal the tree will return cancel, cancer, can and cancellation because the node 'n' will have connections to all the sub trees that has a word ending!

I would really like to hear some comments on this.

```
package utils;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Set;

/**
*
* @author Sapan
*/
public class Trie {

private Map children;
private String word = null;
private Set connections;
static int count = 0;

public Trie() {
this.children = new HashMap<>();
connections = new HashSet<>();
}
/*
* Builds a Trie with given query
*/

public void add(String word) {
if (word == null || "".equals(word.trim())) {
throw new IllegalArgumentException("Query can not be empty or null");
}
Trie t = this;
for (Character c : word.toCharArray()) { //for every character in query do the following....
//add character to the current instance and get the child instance right after adding the current char
if (!t.children.containsKey(c)) {
t.children.put(c, new Trie());
}
t = t.children.get(c);
}
t.word = word;
}

Solution

-
Eclipse says that the second null check in the search method is dead code:

if (current == null) {
    return matches;
}


Furthermore, it could be refactored to the following:

public Set search(final String word) {
    count = 0;
    Trie current = this;
    for (final Character c: word.toCharArray()) {
        ++count;
        current = current.children.get(c);
        if (current == null) {
            return Collections.emptySet();
        }
    }
    return current.connections;
}


-
You should specify the return types:

public Set search(final String word) { ... }
public Set build() {... }


instead of

public Set search(final String word) { ... }
public Set build() {... }


-
Consider using

  • StringUtils.isBlank instead of the (word == null || "".equals(word.trim())) and



  • StringUtils.isNotEmpty instead of (this.word != null && this.word.length() > 0).



I've found them more readable.

-
If I'm right you could iterate over the values directly in the build method:

for (final Trie childTrie: children.values()) {
    connections.addAll(childTrie.build());
}


-
I'd separate the builder logic from the Trie class. Currently clients can easily misuse this class:

t.build();
t.add("can2");


(Effective Java, 2nd Edition, Item 2: Consider a builder when faced with many constructor parameters)

-
connections does not seem a good name. It took some time to figure out that this set contains the descendant Trie objects which has a word. I might call it descendantsWithWord. Furthermore, it could contain only the words (strings), not the Trie objects, so it could be simply Set descendantWords.

-
The static count field does not smells well. It is used for two different purposes. I'd create a createdTrieObjects field:

private int createdTrieObjects = 1;


and increment it in the add method:

t.children.put(c, new Trie());
createdTrieObjects++;


On the other hand, in the search method you could return a SearchResult object which contains the search depth (it's the count currently) and the result strings:

public class SearchResult {

    private final Set words;
    private final int searchDepth;

    ...

}


It contains only a set of string instead of the Set collection. I think clients should not know about the internal Trie objects. If you don't need the search depth return with Set only:

public Set search(final String wordStart) {
    Trie current = this;
    for (final Character c: wordStart.toCharArray()) {
        current = current.children.get(c);
        if (current == null) {
            return Collections.emptySet();
        }
    }
    return Collections.unmodifiableSet(current.descendantWords);
}


Please note the Collections.unmodifiableSet call. It prohibits malicious clients to modify the Trie's internal structure. (Effective Java, 2nd Edition, Item 39: Make defensive copies when needed)

-
You could eliminate the build method and the word field if you fill the descendantWords in the add method:

public void add(final String word) {
    checkArgument(StringUtils.isNotBlank(word), "word can not be blank");
    Trie current = this;
    descendantWords.add(word);
    for (final Character c: word.toCharArray()) {
        if (!current.children.containsKey(c)) {
            current.children.put(c, new Trie());
            createdTrieObjects++;
        }
        current = current.children.get(c);
        current.descendantWords.add(word);
    }
}


Some small notes about the edit:

-
Instead of the checked IllegalAccessException I'd throw runtime IllegalStateException. It's rather a programming error if a client tries to add new element to a built trie. (Effective Java, 2nd Edition, Item 58: Use checked exceptions for recoverable conditions and runtime exceptions for programming errors)

-
About the s variable in the search method: Try to minimize the scope of local variables. It's not necessary to declare them at the beginning of the method, declare them where they are first used. (Effective Java, Second Edition, Item 45: Minimize the scope of local variables)

-
You could copy the descendantsWithWord set with the HashSet's constructor instead of the for loop in the search method:

return new HashSet(current.descendantWords);


-
The isBuilt flag should be at the beginning of the class with the other fields.

Here is the refactored version:

```
import java.util.Collections;

import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;

import org.apache.commons.lang3.StringUtils;

import static com.google.common.base.Preconditions.*;

public class Trie {

private final Map children = new HashMap();
private final Set descendantWords = new HashSet();

public Trie() {
}

public void add(final String word) {
checkArgument(StringUtils.isNotBlank(word), "word can not

Code Snippets

if (current == null) {
    return matches;
}
public Set<Trie> search(final String word) {
    count = 0;
    Trie current = this;
    for (final Character c: word.toCharArray()) {
        ++count;
        current = current.children.get(c);
        if (current == null) {
            return Collections.emptySet();
        }
    }
    return current.connections;
}
public Set<Trie> search(final String word) { ... }
public Set<Trie> build() {... }
public Set search(final String word) { ... }
public Set build() {... }
for (final Trie childTrie: children.values()) {
    connections.addAll(childTrie.build());
}

Context

StackExchange Code Review Q#15887, answer score: 8

Revisions (0)

No revisions yet.