patternjavaMinor
Needle in a Haystack
Viewed 0 times
haystackneedlestackoverflow
Problem
This is a rags-to-riches of some recent "search word in multiple files" questions, such as this deleted question and this.
The idea here is to do a case-insensitive word (complete, not partial) search that can also report the line and 'column' number of the search result. For example, if the word is found as the first word on the first row, its location can be represented as
I have a
Questions:
```
public final class Position implements Comparable {
public static final Comparator COMPARATOR =
Comparator.comparing(Position::getRow)
.thenComparing(Position::getColumn);
private final int row;
private final int column;
private final int hashCode;
public Position(int row, int column) {
this.row = row;
this.column = column;
this.hashCode = Objects.hash(row, column);
}
public int getRow() {
return row;
}
public int getColumn() {
return column;
}
@Override
public int hashCode() {
return hashCode;
}
@Override
public boolean equals(Object o) {
return o == this || (o instanceof Position && equals((Position) o));
}
private boolean e
The idea here is to do a case-insensitive word (complete, not partial) search that can also report the line and 'column' number of the search result. For example, if the word is found as the first word on the first row, its location can be represented as
row = 0, column = 0, or r0c0.I have a
Position model class to represent a search result, and an interface WordCount that represents how searches can be done and the results reported. A FileWordCount handles the searching of one Path, and a FileWordSearcher (while implementing WordCount too) performs the same search operations on a Set. For completeness, a TestNG test is also provided.Questions:
- Is there a better way to write the test, other than listing the expected objects literally?
- Any other refactoring suggested on the main classes/interfaces?
- I was pondering if a
Set(backed by aHashSet) is adequate, or should there be an ordering as well usingLinkedHashSet. AListprovides a predictable iteration/sorted order, but loses the uniqueness semantics.
Position```
public final class Position implements Comparable {
public static final Comparator COMPARATOR =
Comparator.comparing(Position::getRow)
.thenComparing(Position::getColumn);
private final int row;
private final int column;
private final int hashCode;
public Position(int row, int column) {
this.row = row;
this.column = column;
this.hashCode = Objects.hash(row, column);
}
public int getRow() {
return row;
}
public int getColumn() {
return column;
}
@Override
public int hashCode() {
return hashCode;
}
@Override
public boolean equals(Object o) {
return o == this || (o instanceof Position && equals((Position) o));
}
private boolean e
Solution
Don't use the stream api when simpler things exist
While you can use the Stream API here, it is easier to understand when you use
Unneeded variable
This variable is only used in the constructor of the function, by removing the variable and the unneeded
public FileWordSearcher(Stream paths) {
Objects.requireNonNull(paths);
this.results = paths.map(FileWordCount::new)
.collect(toUnmodifiableSet());
}
Requirement of
By the way you implemented file reading, you require the usage of
By changing your system to take a URL you can take advantage of test files directly next to the normal files, and without risking bugs if your testing platform tests after putting the files in a jar.
Throw
For any defined failure path of a unit test, you should use
Adding Javadoc
While most developers put the javadoc at the lowest priority, these can help with explaining this. For someone new to your api, it may not been clear that the key from the returned map from
Better testing (your question 1)
You only have 1 testcase, this isn't enough to test the whole application. A other point of testing is that you should limit the things to test in every test, so you can find out what method is at fault, by only looking at the test results.
The following points should be tested:
Examples of small, concrete, on point tests:
Notice I used
Return type of
If I am searching text, I would expect it to return a set that has a order from first to last match, as this is usually how I present the results to my user. A
default Map getCountWithName(String word) {
return Stream.of(word)
.collect(Collectors.toMap(k -> getName(), this::getCount));
}
default Map> getPositionsWithName(String word) {
return Stream.of(word)
.collect(Collectors.toMap(k -> getName(), this::getPositions));
}While you can use the Stream API here, it is easier to understand when you use
Collections.singeletonMap. This returns a map containing 1 key with value.default Map getCountWithName(String word) {
return unmodifiableMap(getName(), getCount(word));
}
default Map> getPositionsWithName(String word) {
return unmodifiableMap(getName(), getPositions(word));
}Unneeded variable
private final Set inputs;
public FileWordSearcher(Stream paths) {
Objects.requireNonNull(paths);
this.inputs = paths.collect(toUnmodifiableSet());
this.results = inputs.stream()
.map(FileWordCount::new)
.collect(toUnmodifiableSet());
}This variable is only used in the constructor of the function, by removing the variable and the unneeded
stream() call you can increase the performance and the memory in 1 shot.public FileWordSearcher(Stream paths) {
Objects.requireNonNull(paths);
this.results = paths.map(FileWordCount::new)
.collect(toUnmodifiableSet());
}
Requirement of
Path objectsBy the way you implemented file reading, you require the usage of
Path objects to your code. This makes unit testing harder as there is no defined location for files, you can only use Class.getResource to get a proper reference to the file.By changing your system to take a URL you can take advantage of test files directly next to the normal files, and without risking bugs if your testing platform tests after putting the files in a jar.
Throw
AssertionError while testingFor any defined failure path of a unit test, you should use
AssertionError or Assert.fail() rather than throwing a IllegalArgumentException.Adding Javadoc
While most developers put the javadoc at the lowest priority, these can help with explaining this. For someone new to your api, it may not been clear that the key from the returned map from
getCountWithName represents a the name of the WordCount.Better testing (your question 1)
You only have 1 testcase, this isn't enough to test the whole application. A other point of testing is that you should limit the things to test in every test, so you can find out what method is at fault, by only looking at the test results.
The following points should be tested:
- Empty file to search
- Searching for a file with exact the same content
Examples of small, concrete, on point tests:
@Test
public void emptyInputSearchTest() {
FileWordSearcher searcher = new FileWordSearcher(Stream.empty());
assertTrue(searcher.getCountWithName("test").isEmpty());
assertTrue(searcher.getPositionsWithName("test").isEmpty());
}
@Test
public void FileToSearchMatchInputWordTest() {
Path p = ....; //Image this path has a file that contains only "test"
FileWordSearcher searcher = new FileWordSearcher(Stream.of(p));
int count = searcher.getCountWithName("test").get(p.getName());
assertTrue(count == 1);
Set pos = searcher.getPositionsWithName("test").get(p.getName());
assertTrue(pos.size() == 1);
assertTrue(pos.contains(new Position(0,0)));
}Notice I used
assertTrue in the above examples as I dont know how to work with Hamcrest yet....Return type of
getPositions and getCounts (question 2)If I am searching text, I would expect it to return a set that has a order from first to last match, as this is usually how I present the results to my user. A
NavigableSet is perfect for this as it has a iterator() for moving from the start top the end, and a descendingIterator() for the other way. a TreeSet is a commonly used Set for implementing a NavigableSet.Code Snippets
default Map<String, Integer> getCountWithName(String word) {
return Stream.of(word)
.collect(Collectors.toMap(k -> getName(), this::getCount));
}
default Map<String, Set<Position>> getPositionsWithName(String word) {
return Stream.of(word)
.collect(Collectors.toMap(k -> getName(), this::getPositions));
}default Map<String, Integer> getCountWithName(String word) {
return unmodifiableMap(getName(), getCount(word));
}
default Map<String, Set<Position>> getPositionsWithName(String word) {
return unmodifiableMap(getName(), getPositions(word));
}private final Set<Path> inputs;
public FileWordSearcher(Stream<Path> paths) {
Objects.requireNonNull(paths);
this.inputs = paths.collect(toUnmodifiableSet());
this.results = inputs.stream()
.map(FileWordCount::new)
.collect(toUnmodifiableSet());
}@Test
public void emptyInputSearchTest() {
FileWordSearcher searcher = new FileWordSearcher(Stream.empty());
assertTrue(searcher.getCountWithName("test").isEmpty());
assertTrue(searcher.getPositionsWithName("test").isEmpty());
}
@Test
public void FileToSearchMatchInputWordTest() {
Path p = ....; //Image this path has a file that contains only "test"
FileWordSearcher searcher = new FileWordSearcher(Stream.of(p));
int count = searcher.getCountWithName("test").get(p.getName());
assertTrue(count == 1);
Set<Position> pos = searcher.getPositionsWithName("test").get(p.getName());
assertTrue(pos.size() == 1);
assertTrue(pos.contains(new Position(0,0)));
}Context
StackExchange Code Review Q#118657, answer score: 2
Revisions (0)
No revisions yet.