patternjavaMinor

Needle in a Haystack

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

haystackneedlestackoverflow

Problem

This is a rags-to-riches of some recent "search word in multiple files" questions, such as this deleted question and this.

The idea here is to do a case-insensitive word (complete, not partial) search that can also report the line and 'column' number of the search result. For example, if the word is found as the first word on the first row, its location can be represented as row = 0, column = 0, or r0c0.

I have a Position model class to represent a search result, and an interface WordCount that represents how searches can be done and the results reported. A FileWordCount handles the searching of one Path, and a FileWordSearcher (while implementing WordCount too) performs the same search operations on a Set. For completeness, a TestNG test is also provided.

Questions:

Is there a better way to write the test, other than listing the expected objects literally?

Any other refactoring suggested on the main classes/interfaces?

I was pondering if a Set (backed by a HashSet) is adequate, or should there be an ordering as well using LinkedHashSet. A List provides a predictable iteration/sorted order, but loses the uniqueness semantics.

Position

```
public final class Position implements Comparable {
public static final Comparator COMPARATOR =
Comparator.comparing(Position::getRow)
.thenComparing(Position::getColumn);

private final int row;
private final int column;
private final int hashCode;

public Position(int row, int column) {
this.row = row;
this.column = column;
this.hashCode = Objects.hash(row, column);
}

public int getRow() {
return row;
}

public int getColumn() {
return column;
}

@Override
public int hashCode() {
return hashCode;
}

@Override
public boolean equals(Object o) {
return o == this || (o instanceof Position && equals((Position) o));
}

private boolean e

Solution

Don't use the stream api when simpler things exist

default Map getCountWithName(String word) {
    return Stream.of(word)
                .collect(Collectors.toMap(k -> getName(), this::getCount));
}

default Map> getPositionsWithName(String word) {
    return Stream.of(word)
                .collect(Collectors.toMap(k -> getName(), this::getPositions));
}

While you can use the Stream API here, it is easier to understand when you use Collections.singeletonMap. This returns a map containing 1 key with value.

default Map getCountWithName(String word) {
    return unmodifiableMap(getName(), getCount(word));
}

default Map> getPositionsWithName(String word) {
    return unmodifiableMap(getName(), getPositions(word));
}

Unneeded variable

private final Set inputs;
public FileWordSearcher(Stream paths) {
    Objects.requireNonNull(paths);
    this.inputs = paths.collect(toUnmodifiableSet());
    this.results = inputs.stream()
                            .map(FileWordCount::new)
                            .collect(toUnmodifiableSet());
}

This variable is only used in the constructor of the function, by removing the variable and the unneeded stream() call you can increase the performance and the memory in 1 shot.

public FileWordSearcher(Stream paths) {
Objects.requireNonNull(paths);
this.results = paths.map(FileWordCount::new)
.collect(toUnmodifiableSet());
}

Requirement of Path objects

By the way you implemented file reading, you require the usage of Path objects to your code. This makes unit testing harder as there is no defined location for files, you can only use Class.getResource to get a proper reference to the file.

By changing your system to take a URL you can take advantage of test files directly next to the normal files, and without risking bugs if your testing platform tests after putting the files in a jar.

Throw AssertionError while testing

For any defined failure path of a unit test, you should use AssertionError or Assert.fail() rather than throwing a IllegalArgumentException.

Adding Javadoc

While most developers put the javadoc at the lowest priority, these can help with explaining this. For someone new to your api, it may not been clear that the key from the returned map from getCountWithName represents a the name of the WordCount.

Better testing (your question 1)

You only have 1 testcase, this isn't enough to test the whole application. A other point of testing is that you should limit the things to test in every test, so you can find out what method is at fault, by only looking at the test results.

The following points should be tested:

Empty file to search

Searching for a file with exact the same content

Examples of small, concrete, on point tests:

@Test
public void emptyInputSearchTest() {
    FileWordSearcher searcher = new FileWordSearcher(Stream.empty());
    assertTrue(searcher.getCountWithName("test").isEmpty());
    assertTrue(searcher.getPositionsWithName("test").isEmpty());
}

@Test
public void FileToSearchMatchInputWordTest() {
    Path p = ....; //Image this path has a file that contains only "test"
    FileWordSearcher searcher = new FileWordSearcher(Stream.of(p));
    int count = searcher.getCountWithName("test").get(p.getName());
    assertTrue(count == 1);
    Set pos = searcher.getPositionsWithName("test").get(p.getName());
    assertTrue(pos.size() == 1);
    assertTrue(pos.contains(new Position(0,0)));
}

Notice I used assertTrue in the above examples as I dont know how to work with Hamcrest yet....

Return type of getPositions and getCounts (question 2)

If I am searching text, I would expect it to return a set that has a order from first to last match, as this is usually how I present the results to my user. A NavigableSet is perfect for this as it has a iterator() for moving from the start top the end, and a descendingIterator() for the other way. a TreeSet is a commonly used Set for implementing a NavigableSet.

Code Snippets

default Map<String, Integer> getCountWithName(String word) {
    return Stream.of(word)
                .collect(Collectors.toMap(k -> getName(), this::getCount));
}

default Map<String, Set<Position>> getPositionsWithName(String word) {
    return Stream.of(word)
                .collect(Collectors.toMap(k -> getName(), this::getPositions));
}

default Map<String, Integer> getCountWithName(String word) {
    return unmodifiableMap(getName(), getCount(word));
}

default Map<String, Set<Position>> getPositionsWithName(String word) {
    return unmodifiableMap(getName(), getPositions(word));
}

private final Set<Path> inputs;
public FileWordSearcher(Stream<Path> paths) {
    Objects.requireNonNull(paths);
    this.inputs = paths.collect(toUnmodifiableSet());
    this.results = inputs.stream()
                            .map(FileWordCount::new)
                            .collect(toUnmodifiableSet());
}

@Test
public void emptyInputSearchTest() {
    FileWordSearcher searcher = new FileWordSearcher(Stream.empty());
    assertTrue(searcher.getCountWithName("test").isEmpty());
    assertTrue(searcher.getPositionsWithName("test").isEmpty());
}

@Test
public void FileToSearchMatchInputWordTest() {
    Path p = ....; //Image this path has a file that contains only "test"
    FileWordSearcher searcher = new FileWordSearcher(Stream.of(p));
    int count = searcher.getCountWithName("test").get(p.getName());
    assertTrue(count == 1);
    Set<Position> pos = searcher.getPositionsWithName("test").get(p.getName());
    assertTrue(pos.size() == 1);
    assertTrue(pos.contains(new Position(0,0)));
}

Context

StackExchange Code Review Q#118657, answer score: 2

Revisions (0)

No revisions yet.