patternjavaMinor

Autocomplete engine

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

engineautocompletestackoverflow

Problem

The purpose of this program is to implement logic of autocomplete feature (Google Search, for instance). The code is divided into 3 parts:

Term - represents single search term with the query and the weight (number of times this query was searched for).

BinarySearchDeluxe - utility class that provides methods for finding the first and the last keys using binary search algorithm.

Autocomplete - given the data and the query (denoted as prefix), this class provides method for searching for terms in the data that happen to start with the given query and returns them array in descending order (by popularity, if you will).

Edit: Here is the data that you can feed into Autcomplete class, in order to test it : ftp://ftp.cs.princeton.edu/pub/cs226/autocomplete/

If you are using Linux machine, you can pull data with the following command:

wget -r -nH --cut-dirs=2 ftp://ftp.cs.princeton.edu/pub/cs226/autocomplete/file_name

Term.java

```
import java.util.Comparator;

/**
* Term is an immutable data type that represents an autocomplete term:
* a query string and an associated integer weight
*/
public class Term implements Comparable {
private final String query;
private final long weight;

/* Rep Invariant
* true
* Abstraction Function
* represents a single query term
* Safety Exposure Argument
* All fields are private, final and immutable.
*/

/**
* Initializes a term with the specified query and the weight. Query should be non-null
* and weight must be non-negative.
* @param query - the query to be searched for
* @param weight - the corresponding weight of the query
* @throws NullPointerException - if query == null
* @throws IllegalArgumentException - if weight
* 0 if this.weight == that.weight
* 1 if this.weight > that.weight
*/
public static Comparator byReverseWeightOrder() {
return new ByReverseWeightOrder();

Solution

Bug

numberOfMatches(String) will return a correct result only on the first call. If you call it again with another prefix, it will return the same value, because it will not be recalculated.

Different approaches can be used to fix it. Previous search prefix can be traced and checked, or even a Map of prefix -> nbResults can store already calculated values.

Code Duplication in ByPrefixOrder Comparator

It's a good idea to normalize the output of compateTo method so as it returns in range [-1, 0, 1], but the implementation contains repetitions of same code.

If I understand well the logic, the comparison of term query strings is made only for the substrings of the queries that do not exceed r number of characters. To normalize the input strings, a shortcut method is useful:

private String cutIfLengthGreaterThanR(String termString) {
  final int endIndex = termString.length() > r ? r : termString.length();
  return termString.substring(0, endIndex);
}

This allows to reduce the compareTo method:

@Override
public int compare(Term t1, Term t2) {
  String s1 = cutIfLengthGreaterThanR(t1.query);
  String s2 = cutIfLengthGreaterThanR(t2.query);
  final int cmp = s1.compareToIgnoreCase(s2);
  if (cmp = 1) {
    return 1;
  }
  else {
    return 0;
  }
  // or even a less readable one-liner:
  //  return (cmp == 0) ? 0 : (cmp < 0) ? -1 : 1;
}

If you are using Java 8, there is no need to create a dedicated class for this purpose. Lambdas are helpful:

public static Comparator byPrefixOrder(int r) {
  if (r  {
    String s1 = cutIfLengthGreaterThanR(t1.query, r);
    String s2 = cutIfLengthGreaterThanR(t2.query, r);
    final int cmp = s1.compareToIgnoreCase(s2);
    return (cmp == 0) ? 0 : (cmp < 0) ? -1 : 1;
  };
}

private static String cutIfLengthGreaterThanR(String termString, int r) {
  // same code as above
}

Args Validation

It's also a good idea to validate method arguments, but I think that NullPointerException is not really appropriate here. The checks like

if (query == null) ...
if (terms == null) ...

should all throw IllegalArgumentException: it underlines the requirement for non-nullability of the args.

Array Copies

Arrays are copied "manually" in Autocomplete constructor and allMatches.

java.util.Arrays contains a series of copyOf methods that could make the code shorter and less prone to bugs.

BinarySearchDeluxe

Do not use full-word names for generic types. Key can be easily misleading, even for an IDE (for example, my IntelliJ automatically imported java.security.Key, which is very confusing). Use single-letter names for generics: T, E, U...

The two *indexOf methods differ only in conditional branches inside the while loop, but all the other code is redundant. That can be improved. Using Java 8, we can extract the equality condition check and the lo value change function into the signature of the method:

private static  int indexOf(T[] a,
                               T key,
                               Comparator comparator,
                               Predicate eqTest,
                               IntFunction eqLowChangeFunction) {
  // null checks

  if (a.length == 0) return -1;

  int lo = 0;
  int hi = a.length;

  while (lo = 1)  hi = mid - 1;
    else if (cmp <= -1) lo = mid + 1;
    else if (eqTest.test(mid)) lo = eqLowChangeFunction.apply(mid);
    else return mid;
  }
  return -1;
}

The two public methods become as follows:

public static  int firstIndexOf(T[] a, T key, Comparator comparator) {
  return indexOf(a,
                 key,
                 comparator,
                 mid -> comparator.compare(a[mid-1], a[mid]) == 0,
                 mid -> mid - 1);
}

public static  int lastIndexOf(T[] a, T key, Comparator comparator) {
  return indexOf(a,
                 key,
                 comparator,
                 mid -> comparator.compare(a[mid+1], a[mid]) == 0,
                 mid -> mid + 1);
}

The quick checks for first or last elements are not extracted here, but still can be.

Code Snippets

private String cutIfLengthGreaterThanR(String termString) {
  final int endIndex = termString.length() > r ? r : termString.length();
  return termString.substring(0, endIndex);
}

@Override
public int compare(Term t1, Term t2) {
  String s1 = cutIfLengthGreaterThanR(t1.query);
  String s2 = cutIfLengthGreaterThanR(t2.query);
  final int cmp = s1.compareToIgnoreCase(s2);
  if (cmp <= -1) {
    return -1;
  }
  else if (cmp >= 1) {
    return 1;
  }
  else {
    return 0;
  }
  // or even a less readable one-liner:
  //  return (cmp == 0) ? 0 : (cmp < 0) ? -1 : 1;
}

public static Comparator<Term> byPrefixOrder(int r) {
  if (r < 0) throw new IllegalArgumentException("r must be nonnegative, but was " + r);
  return (t1, t2) -> {
    String s1 = cutIfLengthGreaterThanR(t1.query, r);
    String s2 = cutIfLengthGreaterThanR(t2.query, r);
    final int cmp = s1.compareToIgnoreCase(s2);
    return (cmp == 0) ? 0 : (cmp < 0) ? -1 : 1;
  };
}

private static String cutIfLengthGreaterThanR(String termString, int r) {
  // same code as above
}

if (query == null) ...
if (terms == null) ...

private static <T> int indexOf(T[] a,
                               T key,
                               Comparator<T> comparator,
                               Predicate<Integer> eqTest,
                               IntFunction<Integer> eqLowChangeFunction) {
  // null checks

  if (a.length == 0) return -1;

  int lo = 0;
  int hi = a.length;

  while (lo <= hi) {
    int mid = lo + (hi - lo) / 2;
    int cmp = comparator.compare(a[mid], key);

    if      (cmp >= 1)  hi = mid - 1;
    else if (cmp <= -1) lo = mid + 1;
    else if (eqTest.test(mid)) lo = eqLowChangeFunction.apply(mid);
    else return mid;
  }
  return -1;
}

Context

StackExchange Code Review Q#145466, answer score: 2

Revisions (0)

No revisions yet.