patternjavaMinor
Merging word streams from files
Viewed 0 times
mergingwordfilesstreamsfrom
Problem
A follow-on, rags-to-riches implementation of The most efficient way to merge two lists in Java
The original requirements are to:
Identify the distinct values from two input files, and output the
distinct values to an output file. There is no specification for the
order of the output, only that each line should be unique in the
results.
Special consideration should be made for efficiency.
I have implemented a more general specification:
In my answer to the linked post I suggested that a Java 8 Streams implementation would be "nice". I have implemented that solution here. I am looking for suggestions on how to better utilize the new Java functionality, and any other suggestions you may have.
```
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.HashSet;
import java.util.Set;
import java.util.stream.Stream;
@SuppressWarnings("javadoc")
public class Linemerge {
/ Wrap the IOException in order to make convenient Stream usage. /
private static final void writeWord(BufferedWriter writer, String word) {
try {
writer.write(word);
writer.newLine();
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
private static void merge(Path source, Set seen, BufferedWriter writer) throws IOException {
try (Stream words = Files.lines(source)) {
words.filter(seen::add).forEach(word -> writeWord(writer, word));
}
}
public static void main(String[] args) {
if (args.length seen
The original requirements are to:
Identify the distinct values from two input files, and output the
distinct values to an output file. There is no specification for the
order of the output, only that each line should be unique in the
results.
Special consideration should be made for efficiency.
I have implemented a more general specification:
- merge multiple input files (at least one) to an output file
- each line is treated as a line, not necessarily a "word". If the input files have just one word per line, then the output would be the same as the original specification.
- take the input files from the commandline (the first file is the output file).
In my answer to the linked post I suggested that a Java 8 Streams implementation would be "nice". I have implemented that solution here. I am looking for suggestions on how to better utilize the new Java functionality, and any other suggestions you may have.
```
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.HashSet;
import java.util.Set;
import java.util.stream.Stream;
@SuppressWarnings("javadoc")
public class Linemerge {
/ Wrap the IOException in order to make convenient Stream usage. /
private static final void writeWord(BufferedWriter writer, String word) {
try {
writer.write(word);
writer.newLine();
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
private static void merge(Path source, Set seen, BufferedWriter writer) throws IOException {
try (Stream words = Files.lines(source)) {
words.filter(seen::add).forEach(word -> writeWord(writer, word));
}
}
public static void main(String[] args) {
if (args.length seen
Solution
"Need at least two file arguments: Destination Source {Source {Source {...}}}"I think an easier way of documenting that, at least for
*nix, is:"Need at least two file arguments: DESTINATION [SOURCE]..."You can also turn the
Paths in your main() method into a Stream too:public class Linemerge {
// ...
// suggestion note: had to wrap IOException -> UncheckedIOException too
private static void merge(Path source, Set seen, BufferedWriter writer) {
try (Stream words = Files.lines(source)) {
words.filter(seen::add).forEach(word -> writeWord(writer, word));
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
private static final Predicate FILTER =
f -> Files.isRegularFile(f) && Files.isReadable(f);
private static void checkPath(Path path) {
System.out.println((FILTER.test(path) ? "Merging"
: "Unable to read (and ignoring)") + " " + path);
}
public static void main(String[] args) {
if (args.length seen = new HashSet<>();
Stream.of(args).skip(1).map(Paths::get).peek(Linemerge::checkPath)
.filter(FILTER).forEach(f -> merge(f, seen, writer));
} catch (IOException e) {
e.printStackTrace();
System.exit(1);
}
}
}Code Snippets
"Need at least two file arguments: DESTINATION [SOURCE]..."public class Linemerge {
// ...
// suggestion note: had to wrap IOException -> UncheckedIOException too
private static void merge(Path source, Set<String> seen, BufferedWriter writer) {
try (Stream<String> words = Files.lines(source)) {
words.filter(seen::add).forEach(word -> writeWord(writer, word));
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
private static final Predicate<Path> FILTER =
f -> Files.isRegularFile(f) && Files.isReadable(f);
private static void checkPath(Path path) {
System.out.println((FILTER.test(path) ? "Merging"
: "Unable to read (and ignoring)") + " " + path);
}
public static void main(String[] args) {
if (args.length < 2) {
throw new IllegalArgumentException(
"Need at least two file arguments: DESTINATION [SOURCE]...");
}
try (BufferedWriter writer = Files.newBufferedWriter(Paths.get(args[0]))) {
Set<String> seen = new HashSet<>();
Stream.of(args).skip(1).map(Paths::get).peek(Linemerge::checkPath)
.filter(FILTER).forEach(f -> merge(f, seen, writer));
} catch (IOException e) {
e.printStackTrace();
System.exit(1);
}
}
}Context
StackExchange Code Review Q#92913, answer score: 6
Revisions (0)
No revisions yet.