patternjavaMinor
Autocomplete engine
Viewed 0 times
engineautocompletestackoverflow
Problem
The purpose of this program is to implement logic of autocomplete feature (Google Search, for instance). The code is divided into 3 parts:
Edit: Here is the data that you can feed into Autcomplete class, in order to test it : ftp://ftp.cs.princeton.edu/pub/cs226/autocomplete/
Term.java
```
import java.util.Comparator;
/**
* Term is an immutable data type that represents an autocomplete term:
* a query string and an associated integer weight
*/
public class Term implements Comparable {
private final String query;
private final long weight;
/* Rep Invariant
* true
* Abstraction Function
* represents a single query term
* Safety Exposure Argument
* All fields are private, final and immutable.
*/
/**
* Initializes a term with the specified query and the weight. Query should be non-null
* and weight must be non-negative.
* @param query - the query to be searched for
* @param weight - the corresponding weight of the query
* @throws NullPointerException - if query == null
* @throws IllegalArgumentException - if weight
* 0 if this.weight == that.weight
* 1 if this.weight > that.weight
*/
public static Comparator byReverseWeightOrder() {
return new ByReverseWeightOrder();
Term- represents single search term with the query and the weight (number of times this query was searched for).
BinarySearchDeluxe- utility class that provides methods for finding the first and the last keys using binary search algorithm.
Autocomplete- given the data and the query (denoted as prefix), this class provides method for searching for terms in the data that happen to start with the given query and returns them array in descending order (by popularity, if you will).
Edit: Here is the data that you can feed into Autcomplete class, in order to test it : ftp://ftp.cs.princeton.edu/pub/cs226/autocomplete/
- If you are using Linux machine, you can pull data with the following command:
wget -r -nH --cut-dirs=2 ftp://ftp.cs.princeton.edu/pub/cs226/autocomplete/file_nameTerm.java
```
import java.util.Comparator;
/**
* Term is an immutable data type that represents an autocomplete term:
* a query string and an associated integer weight
*/
public class Term implements Comparable {
private final String query;
private final long weight;
/* Rep Invariant
* true
* Abstraction Function
* represents a single query term
* Safety Exposure Argument
* All fields are private, final and immutable.
*/
/**
* Initializes a term with the specified query and the weight. Query should be non-null
* and weight must be non-negative.
* @param query - the query to be searched for
* @param weight - the corresponding weight of the query
* @throws NullPointerException - if query == null
* @throws IllegalArgumentException - if weight
* 0 if this.weight == that.weight
* 1 if this.weight > that.weight
*/
public static Comparator byReverseWeightOrder() {
return new ByReverseWeightOrder();
Solution
Bug
Different approaches can be used to fix it. Previous search prefix can be traced and checked, or even a Map of prefix -> nbResults can store already calculated values.
Code Duplication in
It's a good idea to normalize the output of
If I understand well the logic, the comparison of term query strings is made only for the substrings of the queries that do not exceed
This allows to reduce the
If you are using Java 8, there is no need to create a dedicated class for this purpose. Lambdas are helpful:
Args Validation
It's also a good idea to validate method arguments, but I think that
should all throw
Array Copies
Arrays are copied "manually" in
Do not use full-word names for generic types.
The two
The two public methods become as follows:
The quick checks for first or last elements are not extracted here, but still can be.
numberOfMatches(String) will return a correct result only on the first call. If you call it again with another prefix, it will return the same value, because it will not be recalculated.Different approaches can be used to fix it. Previous search prefix can be traced and checked, or even a Map of prefix -> nbResults can store already calculated values.
Code Duplication in
ByPrefixOrder ComparatorIt's a good idea to normalize the output of
compateTo method so as it returns in range [-1, 0, 1], but the implementation contains repetitions of same code.If I understand well the logic, the comparison of term query strings is made only for the substrings of the queries that do not exceed
r number of characters. To normalize the input strings, a shortcut method is useful:private String cutIfLengthGreaterThanR(String termString) {
final int endIndex = termString.length() > r ? r : termString.length();
return termString.substring(0, endIndex);
}This allows to reduce the
compareTo method:@Override
public int compare(Term t1, Term t2) {
String s1 = cutIfLengthGreaterThanR(t1.query);
String s2 = cutIfLengthGreaterThanR(t2.query);
final int cmp = s1.compareToIgnoreCase(s2);
if (cmp = 1) {
return 1;
}
else {
return 0;
}
// or even a less readable one-liner:
// return (cmp == 0) ? 0 : (cmp < 0) ? -1 : 1;
}If you are using Java 8, there is no need to create a dedicated class for this purpose. Lambdas are helpful:
public static Comparator byPrefixOrder(int r) {
if (r {
String s1 = cutIfLengthGreaterThanR(t1.query, r);
String s2 = cutIfLengthGreaterThanR(t2.query, r);
final int cmp = s1.compareToIgnoreCase(s2);
return (cmp == 0) ? 0 : (cmp < 0) ? -1 : 1;
};
}
private static String cutIfLengthGreaterThanR(String termString, int r) {
// same code as above
}Args Validation
It's also a good idea to validate method arguments, but I think that
NullPointerException is not really appropriate here. The checks likeif (query == null) ...
if (terms == null) ...should all throw
IllegalArgumentException: it underlines the requirement for non-nullability of the args.Array Copies
Arrays are copied "manually" in
Autocomplete constructor and allMatches.java.util.Arrays contains a series of copyOf methods that could make the code shorter and less prone to bugs.BinarySearchDeluxeDo not use full-word names for generic types.
Key can be easily misleading, even for an IDE (for example, my IntelliJ automatically imported java.security.Key, which is very confusing). Use single-letter names for generics: T, E, U...The two
*indexOf methods differ only in conditional branches inside the while loop, but all the other code is redundant. That can be improved. Using Java 8, we can extract the equality condition check and the lo value change function into the signature of the method:private static int indexOf(T[] a,
T key,
Comparator comparator,
Predicate eqTest,
IntFunction eqLowChangeFunction) {
// null checks
if (a.length == 0) return -1;
int lo = 0;
int hi = a.length;
while (lo = 1) hi = mid - 1;
else if (cmp <= -1) lo = mid + 1;
else if (eqTest.test(mid)) lo = eqLowChangeFunction.apply(mid);
else return mid;
}
return -1;
}The two public methods become as follows:
public static int firstIndexOf(T[] a, T key, Comparator comparator) {
return indexOf(a,
key,
comparator,
mid -> comparator.compare(a[mid-1], a[mid]) == 0,
mid -> mid - 1);
}
public static int lastIndexOf(T[] a, T key, Comparator comparator) {
return indexOf(a,
key,
comparator,
mid -> comparator.compare(a[mid+1], a[mid]) == 0,
mid -> mid + 1);
}The quick checks for first or last elements are not extracted here, but still can be.
Code Snippets
private String cutIfLengthGreaterThanR(String termString) {
final int endIndex = termString.length() > r ? r : termString.length();
return termString.substring(0, endIndex);
}@Override
public int compare(Term t1, Term t2) {
String s1 = cutIfLengthGreaterThanR(t1.query);
String s2 = cutIfLengthGreaterThanR(t2.query);
final int cmp = s1.compareToIgnoreCase(s2);
if (cmp <= -1) {
return -1;
}
else if (cmp >= 1) {
return 1;
}
else {
return 0;
}
// or even a less readable one-liner:
// return (cmp == 0) ? 0 : (cmp < 0) ? -1 : 1;
}public static Comparator<Term> byPrefixOrder(int r) {
if (r < 0) throw new IllegalArgumentException("r must be nonnegative, but was " + r);
return (t1, t2) -> {
String s1 = cutIfLengthGreaterThanR(t1.query, r);
String s2 = cutIfLengthGreaterThanR(t2.query, r);
final int cmp = s1.compareToIgnoreCase(s2);
return (cmp == 0) ? 0 : (cmp < 0) ? -1 : 1;
};
}
private static String cutIfLengthGreaterThanR(String termString, int r) {
// same code as above
}if (query == null) ...
if (terms == null) ...private static <T> int indexOf(T[] a,
T key,
Comparator<T> comparator,
Predicate<Integer> eqTest,
IntFunction<Integer> eqLowChangeFunction) {
// null checks
if (a.length == 0) return -1;
int lo = 0;
int hi = a.length;
while (lo <= hi) {
int mid = lo + (hi - lo) / 2;
int cmp = comparator.compare(a[mid], key);
if (cmp >= 1) hi = mid - 1;
else if (cmp <= -1) lo = mid + 1;
else if (eqTest.test(mid)) lo = eqLowChangeFunction.apply(mid);
else return mid;
}
return -1;
}Context
StackExchange Code Review Q#145466, answer score: 2
Revisions (0)
No revisions yet.