HiveBrain v1.2.0
Get Started
← Back to all entries
snippetjavaMinor

Sort a strings collection by a search term

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
searchcollectiontermsortstrings

Problem

I'm trying to implement a search bar in which the user sends a search term and the search bar displays the results.

I have a naive solution which separates the original list into three sublists:
a list with all the exact matches, a list with all the phrases containing the search term and a list with all the phrases that doesn't match. Later on I sort the unmatched list using Edit Distance algorithm.

Is this the right way to do it or should I concern a different approach such as suffix trees, etc. ?

```
public static List sortListByName(final String name, final List itemsList,
final Function itemNameExtractor, int limit) {
List res = new LinkedList<>();
List containedList = new LinkedList<>();
List unmatchedList = new LinkedList<>();

// First, separates the list into sublists which have exact word match; the search term is a
// substring; the search term doesn't appear at all.

for (T item : itemsList) {
String currItemName = itemNameExtractor.apply(item);
if (currItemName.contains(name)) {
boolean isExactMatch = false;
for (String term : currItemName.split(" ")) {
if (term.equals(name)) {
res.add(item);
isExactMatch = true;
break;
}
}
if (!isExactMatch) {
containedList.add(item);
}
} else {
unmatchedList.add(item);
}
}
Comparator comparator = new Comparator() {
@Override
public int compare(final T t1, final T t2) {
Integer n1Length = itemNameExtractor.apply(t1).length();
Integer n2Length = itemNameExtractor.apply(t2).length();
return n1Length.compareTo(n2Length);
}
};

Collections.sort(res, comparator);
Collections.sort(containedList, comparator);
res.addAll(containedList);

if (res.size() >= limit) {
return res.subList(0, limit);
}

// Sort unmatched items by their edit distance rank and concatenate them to the

Solution

1) Regex

You could generate a Pattern instance (regex) to check for exact matching instead of doing String and split acrobatics. If I understand it correctly, you are looking for occurrences of 'name' that are neighbored by 'start of line', 'end of line' or 'whitespace'. For example, if you were to search for "doge" like that, you could use the following regex: (?:^|\\s)(doge)(?:$|\\s)

The fat examples would be matched:

doge

dogeX

X doge X

Xdoge

XdogeX

In Java, you'd do it like this:

Pattern p = Pattern.compile("(?:^|\\s)("+search+")(?:$|\\s)");
if(p.matcher(candidate).find()) { } else { }


Compiling Patterns is slow, so only do it once per method call. If your itemsList is small, it's possible that your raw, manual checking is faster, but if that is the case performance should be no issue anyways.

2) Tree

A specialized Infix Tree that takes care of spaces (And you are looking for infix, not suffix, right?) is naturally the vastly superior solution if you can pay the performance costs of creating and maintaining it. If you decide to use an Infix Tree and you are not satisfied, you could write a Java InfixHashMap which provides O(1) lookup speed. (InfixDictionary: Data structure for Infix string lookup)

3) Functional Java 8

And since you are obviously using Java 8, give up on anonymous classes and go all the way:

Comparator comparator = (t1,t2)->{
        Integer n1Length = itemNameExtractor.apply(t1).length();
        Integer n2Length = itemNameExtractor.apply(t2).length();
        return n1Length.compareTo(n2Length);
    };

Code Snippets

Pattern p = Pattern.compile("(?:^|\\s)("+search+")(?:$|\\s)");
if(p.matcher(candidate).find()) { } else { }
Comparator<T> comparator = (t1,t2)->{
        Integer n1Length = itemNameExtractor.apply(t1).length();
        Integer n2Length = itemNameExtractor.apply(t2).length();
        return n1Length.compareTo(n2Length);
    };

Context

StackExchange Code Review Q#128064, answer score: 2

Revisions (0)

No revisions yet.