patternjavaMinor
Read line and combine duplicate entries based on one of the fields
Viewed 0 times
thecombinelinereadduplicatefieldsonebasedandentries
Problem
The following Java code reads lines from an input text file and will output the entry that has the highest number in field-2 to an output file:
INPUT:
William J. Clinton 6 Q1124
42nd President of the United States 6 Q1124
Bill Clinton,6,Q1124
Bill Clinton,14,Q9890
Bill Clinton,5,Q9880
William Jefferson Clinton,6,Q1124
OUTPUT:
William J. Clinton,Q1124
42nd President of the United
States,Q1124
Bill Clinton,Q9890 (this Q-ID selected because
field-2 was highest number "14")
William Jefferson Clinton,Q1124
Could you offer suggestions on making this more elegant?
```
/*
* This class uses a method to combine duplicate Wiki keys into one and
* selecting the Qid of the one with largest link count
*/
public static void main(String[] args) {
if (args.length < 2) {
usage();
System.exit(0);
}
//
try {
// Create output file
BufferedWriter out = Files.newBufferedWriter(Paths.get(args[1]),
StandardCharsets.UTF_8);
// read input file
BufferedReader br = Files.newBufferedReader(Paths.get(args[0]),
StandardCharsets.UTF_8);
String key = "";
int links = 0;
String qID = "";
String next, line = br.readLine();
for (boolean first = true, last = (line == null); !last; first = false, line = next) {
last = ((next = br.readLine()) == null);
String[] entry = line.split(",");
if (first) {
// only assign current values
key = entry[0];
links = new Integer(entry[1]);
qID = entry[2];
} else if (last) {
//check if key is a duplicate
if(key.equals(entry[0])){
//use the Qid from the highest links
int temp = new Integer(entry[1]);
if(links < temp){
//write entry to output higher links
INPUT:
William J. Clinton 6 Q1124
42nd President of the United States 6 Q1124
Bill Clinton,6,Q1124
Bill Clinton,14,Q9890
Bill Clinton,5,Q9880
William Jefferson Clinton,6,Q1124
OUTPUT:
William J. Clinton,Q1124
42nd President of the United
States,Q1124
Bill Clinton,Q9890 (this Q-ID selected because
field-2 was highest number "14")
William Jefferson Clinton,Q1124
Could you offer suggestions on making this more elegant?
```
/*
* This class uses a method to combine duplicate Wiki keys into one and
* selecting the Qid of the one with largest link count
*/
public static void main(String[] args) {
if (args.length < 2) {
usage();
System.exit(0);
}
//
try {
// Create output file
BufferedWriter out = Files.newBufferedWriter(Paths.get(args[1]),
StandardCharsets.UTF_8);
// read input file
BufferedReader br = Files.newBufferedReader(Paths.get(args[0]),
StandardCharsets.UTF_8);
String key = "";
int links = 0;
String qID = "";
String next, line = br.readLine();
for (boolean first = true, last = (line == null); !last; first = false, line = next) {
last = ((next = br.readLine()) == null);
String[] entry = line.split(",");
if (first) {
// only assign current values
key = entry[0];
links = new Integer(entry[1]);
qID = entry[2];
} else if (last) {
//check if key is a duplicate
if(key.equals(entry[0])){
//use the Qid from the highest links
int temp = new Integer(entry[1]);
if(links < temp){
//write entry to output higher links
Solution
I'm far from a Java expert, but a few things jumped out at me
You forgot to close the reader (
Do not close the streams inside of the try. What happens if an exception gets thrown before the close commands are reached? You need a
If you're only error handling is to just print out the message, I'd be tempted to go ahead and let the exception bubble up and kill the program. Unless you're just trying to go for a more simple error, in which case hiding the stack trace could be nice.
I would just let
I might try to restructure the
On a minor note, you should try to create the reader before the writer. That way you don't create an output file if you can't open the input one.
The repetition in the
I would use
Don't bother commenting really obvious things ("Catch exception if any" for example).
Sometimes you use spaces like
All in all, my Java is a bit rusty, so take this with a grain of salt, but, I would do something like this:
I don't really like the repetition of the outputting, but I can't figure out a way around that. Also, due to the nesting level, I would probably pull the actual processing into a new method that takes in the two already opened streams.
You forgot to close the reader (
br).Do not close the streams inside of the try. What happens if an exception gets thrown before the close commands are reached? You need a
finally block. Or, better yet, if you're using Java 7 or above, use a try-with-resources block.If you're only error handling is to just print out the message, I'd be tempted to go ahead and let the exception bubble up and kill the program. Unless you're just trying to go for a more simple error, in which case hiding the stack trace could be nice.
I would just let
key be null the first time around and check for that instead of using first. I might try to restructure the
for conditions. They're very confusing in the current form. Perhaps loop as long as you successfully read a line, and then handle the last logic outside of the for loop. That way you don't have to have so much going on inside of the loop.On a minor note, you should try to create the reader before the writer. That way you don't create an output file if you can't open the input one.
The repetition in the
last and else blocks should be refactored out into a common case. Based on that they do the exact same thing, I'm not actually sure why last get's special treatment.I would use
Integer.parseInt instead of using unboxing on new Integer(str).temp is a bad variable name. Name it something descriptive of what it actually is.Don't bother commenting really obvious things ("Catch exception if any" for example).
Sometimes you use spaces like
if (...) and sometimes you don't. Pick one and stick with it (by which I really mean always use spaces :p).All in all, my Java is a bit rusty, so take this with a grain of salt, but, I would do something like this:
public static void main(String[] args) {
if (args.length < 2) {
System.err.println("Usage: CombineKeys InputFile OutputFile");
System.exit(1);
}
BufferedWriter out = null;
BufferedReader br = null;
try {
out = Files.newBufferedWriter(Paths.get(args[1]), StandardCharsets.UTF_8);
br = Files.newBufferedReader(Paths.get(args[0]), StandardCharsets.UTF_8);
String key = null;
int links = 0;
String qID = null;
while ((line = br.readLine()) != null) {
String[] entry = line.split(",");
if (entry[0].equalsIgnoreCase(key)) {
int tmpLinks = Integer.parseInt(entry[1]);
if (links < tmpLinks){
links = tmpLinks;
qID = entry[2];
}
} else {
if (key != null) {
out.write(key + "," + qID + "\n");
}
key = entry[0];
links = Integer.parseInt(entry[1]);
qID = entry[2];
}
}
if (key != null) {
out.write(key + "," + qID + "\n");
}
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
} finally {
if (br != null) {
br.close();
}
if (out != null) {
out.close();
}
}
}I don't really like the repetition of the outputting, but I can't figure out a way around that. Also, due to the nesting level, I would probably pull the actual processing into a new method that takes in the two already opened streams.
Code Snippets
public static void main(String[] args) {
if (args.length < 2) {
System.err.println("Usage: CombineKeys InputFile OutputFile");
System.exit(1);
}
BufferedWriter out = null;
BufferedReader br = null;
try {
out = Files.newBufferedWriter(Paths.get(args[1]), StandardCharsets.UTF_8);
br = Files.newBufferedReader(Paths.get(args[0]), StandardCharsets.UTF_8);
String key = null;
int links = 0;
String qID = null;
while ((line = br.readLine()) != null) {
String[] entry = line.split(",");
if (entry[0].equalsIgnoreCase(key)) {
int tmpLinks = Integer.parseInt(entry[1]);
if (links < tmpLinks){
links = tmpLinks;
qID = entry[2];
}
} else {
if (key != null) {
out.write(key + "," + qID + "\n");
}
key = entry[0];
links = Integer.parseInt(entry[1]);
qID = entry[2];
}
}
if (key != null) {
out.write(key + "," + qID + "\n");
}
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
} finally {
if (br != null) {
br.close();
}
if (out != null) {
out.close();
}
}
}Context
StackExchange Code Review Q#43017, answer score: 3
Revisions (0)
No revisions yet.