patternjavaModerate
Program to index a book
Viewed 0 times
bookindexprogram
Problem
Indexing a book. Write a program that reads in a text file from
standard input and compiles an alphabetical index of which words
appear on which lines, as in the following input. Ignore case and
punctuation. For each word maintain a list of location on which it
appears. Try to use HashTable and/or HashMap class (of java.util).
I have used a HashMap to store the line numbers for each word where it appears. Can this program be made better?
Index.java
BookIndexer.java
```
package java_assignments.beg_assignment5;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.InputStreamReader;
public class BookIndexer {
public static void main(String[] args) {
try {
BufferedReader br;
if (args.length == 0) {
br = new BufferedReader(new InputStreamReader(System.in));
standard input and compiles an alphabetical index of which words
appear on which lines, as in the following input. Ignore case and
punctuation. For each word maintain a list of location on which it
appears. Try to use HashTable and/or HashMap class (of java.util).
I have used a HashMap to store the line numbers for each word where it appears. Can this program be made better?
Index.java
package java_assignments.beg_assignment5;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
public class Index {
public Index(Readable text) {
Scanner sc = new Scanner(text);
occurences = new HashMap>();
int lineNo = 1;
try {
while (sc.hasNextLine()) {
String line = sc.nextLine();
String[] words = line.split("\\W+");
for (String word : words) {
word = word.toLowerCase();
ArrayList list = occurences.get(word);
if (list == null) {
list = new ArrayList<>();
list.add(lineNo);
} else {
list.add(lineNo);
}
occurences.put(word, list);
}
lineNo++;
}
} finally {
sc.close();
}
}
public String toString() {
return occurences.toString();
}
private Map> occurences;
}BookIndexer.java
```
package java_assignments.beg_assignment5;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.InputStreamReader;
public class BookIndexer {
public static void main(String[] args) {
try {
BufferedReader br;
if (args.length == 0) {
br = new BufferedReader(new InputStreamReader(System.in));
Solution
Using a try-with-resources
You're correctly closing your
However, starting with Java 7, you can simply use the try-with-resources construct to make this easier:
Reading words with lines
You're using a
This can be a problem: what if you encounter a word that has a dash or a quote? You will wrongly split it. It would be better to split around a whitespace character, i.e.
Also, you're currently using a
Code structure
Your declaration of
is located at the bottom of the class. Generally, instance variables are found at the top instead so that you can see directly what the class has as instance variables.
You're currently using two classes: one for the main part and one to find the occurences. It introduces a problem: the constructor does too much work. In fact, the constructor of
Also, I don't think the
inside the main class that would return the map.
Also, don't name your variables
Handling exceptions
When you're reading a text from a file, you're not directly catching the
This create a coupling between the method and what it reads from. Instead, it would be best to delegate that to a method dedicated to returning the
Note two things:
Lowercasing Strings
Be very careful when lowercasing / uppercasing Strings in Java. This depends on the locale. By default, Java will use the locale of the current JVM, which is your system locale (by default). If you were to read a Turkish text on a server in France, you might have inconsistencies and hard to understand bugs! It is preferable to use a locale when doing those operations
Using Java 8 constructs
Your code updating the
Let alone the fact that you could drop the
If the current word is not in the map, a new
Beginning with Java 8, a
You're correctly closing your
Scanner at the end of the method in a finally block, so there can be no resource leaks.However, starting with Java 7, you can simply use the try-with-resources construct to make this easier:
try (Scanner sc = new Scanner(text)) {
// ...
}Reading words with lines
You're using a
Scanner to read each line and then you are splitting the line on non word characters, i.e. everything that is not [a-zA-Z_0-9]. This can be a problem: what if you encounter a word that has a dash or a quote? You will wrongly split it. It would be better to split around a whitespace character, i.e.
\s.Also, you're currently using a
lineNo variable to hold the current line number. You could use the built-in LineNumberReader that already maintains a line number. You can access it with getLineNumber().Code structure
Your declaration of
private Map> occurences;is located at the bottom of the class. Generally, instance variables are found at the top instead so that you can see directly what the class has as instance variables.
You're currently using two classes: one for the main part and one to find the occurences. It introduces a problem: the constructor does too much work. In fact, the constructor of
Index does all the work. It would be better to refactor this into a method properly named after what it does. We could introduce a method populateOccurences whose goal would be to create the occurences map.Also, I don't think the
Index class is really that necessary: the more a code is simple, the better it is to maintain it. In this case, this class really contains a single method, which to populate the occurences map. It would be easier to not have that class and simply have a methodprivate static Map> getOccurencesMap(Reader text) throws IOExceptioninside the main class that would return the map.
Also, don't name your variables
index_str: use camel-case, as indexStr.Handling exceptions
When you're reading a text from a file, you're not directly catching the
FileNotFoundException, instead you're letting the main method do it:try {
BufferedReader br;
if (args.length == 0) {
br = new BufferedReader(new InputStreamReader(System.in));
} else {
br = new BufferedReader(new FileReader(args[0]));
}
// ...
} catch (FileNotFoundException e) {
e.printStackTrace();
}This create a coupling between the method and what it reads from. Instead, it would be best to delegate that to a method dedicated to returning the
Reader to read:private static Reader getReader(String[] args) {
if (args.length == 0) {
return new BufferedReader(new InputStreamReader(System.in));
} else {
try {
return new BufferedReader(new FileReader(args[0]));
} catch (FileNotFoundException e) {
throw new IllegalArgumentException("The given file does not exist.", e);
}
}
}Note two things:
- The
catch (FileNotFoundException e)is done inside theelsepart: that is the only part of the code responsible for reading a file, so it must be the only part of the code for handling aFileNotFoundException.
- A custom
IllegalArgumentExceptionis re-thrown to indicate that the file wasn't found. This runtime exception wraps the initialFileNotFoundExceptionto have a proper stacktrace but it hides that from the surrounding code.
Lowercasing Strings
Be very careful when lowercasing / uppercasing Strings in Java. This depends on the locale. By default, Java will use the locale of the current JVM, which is your system locale (by default). If you were to read a Turkish text on a server in France, you might have inconsistencies and hard to understand bugs! It is preferable to use a locale when doing those operations
word = word.toLowerCase(Locale.ROOT);Using Java 8 constructs
Your code updating the
Map holding the line numbers for each word reads lineArrayList list = occurences.get(word);
if (list == null) {
list = new ArrayList<>();
list.add(lineNo);
} else {
list.add(lineNo);
}
occurences.put(word, list);Let alone the fact that you could drop the
else clause and have list.add(lineNo); after the if (which would remove this little duplication), you could use the method computeIfAbsent that will get the value for a specified key or if there is no value, set it with an initial value based on the given mapping function. In this case, you can simply haveoccurences.computeIfAbsent(word, k -> new ArrayList<>()).add(lineNo);If the current word is not in the map, a new
ArrayList will be created and returned, otherwise the current list for that word will be returned. Then, on this instance, we add the current line number.Beginning with Java 8, a
BufferedReader also has a useful lines() method that returns a Stream of the lines. Instead of looping with a for, wCode Snippets
try (Scanner sc = new Scanner(text)) {
// ...
}private Map<String, ArrayList<Integer>> occurences;private static Map<String, List<Integer>> getOccurencesMap(Reader text) throws IOExceptiontry {
BufferedReader br;
if (args.length == 0) {
br = new BufferedReader(new InputStreamReader(System.in));
} else {
br = new BufferedReader(new FileReader(args[0]));
}
// ...
} catch (FileNotFoundException e) {
e.printStackTrace();
}private static Reader getReader(String[] args) {
if (args.length == 0) {
return new BufferedReader(new InputStreamReader(System.in));
} else {
try {
return new BufferedReader(new FileReader(args[0]));
} catch (FileNotFoundException e) {
throw new IllegalArgumentException("The given file does not exist.", e);
}
}
}Context
StackExchange Code Review Q#122675, answer score: 11
Revisions (0)
No revisions yet.