HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaModerate

Program to index a book

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
bookindexprogram

Problem

Indexing a book. Write a program that reads in a text file from
standard input and compiles an alphabetical index of which words
appear on which lines, as in the following input. Ignore case and
punctuation. For each word maintain a list of location on which it
appears. Try to use HashTable and/or HashMap class (of java.util).

I have used a HashMap to store the line numbers for each word where it appears. Can this program be made better?

Index.java

package java_assignments.beg_assignment5;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;

public class Index {
    public Index(Readable text) {
        Scanner sc = new Scanner(text);
        occurences = new HashMap>();

        int lineNo = 1;
        try {
            while (sc.hasNextLine()) {
                String line = sc.nextLine();
                String[] words = line.split("\\W+");
                for (String word : words) {
                    word = word.toLowerCase();
                    ArrayList list = occurences.get(word);
                    if (list == null) {
                        list = new ArrayList<>();
                        list.add(lineNo);
                    } else {
                        list.add(lineNo);
                    }
                    occurences.put(word, list);
                }
                lineNo++;
            }
        } finally {
            sc.close();
        }
    }

    public String toString() {
        return occurences.toString();
    }

    private Map> occurences;
}


BookIndexer.java

```
package java_assignments.beg_assignment5;

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.InputStreamReader;

public class BookIndexer {
public static void main(String[] args) {
try {
BufferedReader br;
if (args.length == 0) {
br = new BufferedReader(new InputStreamReader(System.in));

Solution

Using a try-with-resources

You're correctly closing your Scanner at the end of the method in a finally block, so there can be no resource leaks.

However, starting with Java 7, you can simply use the try-with-resources construct to make this easier:

try (Scanner sc = new Scanner(text)) {
    // ...
}


Reading words with lines

You're using a Scanner to read each line and then you are splitting the line on non word characters, i.e. everything that is not [a-zA-Z_0-9].

This can be a problem: what if you encounter a word that has a dash or a quote? You will wrongly split it. It would be better to split around a whitespace character, i.e. \s.

Also, you're currently using a lineNo variable to hold the current line number. You could use the built-in LineNumberReader that already maintains a line number. You can access it with getLineNumber().

Code structure

Your declaration of

private Map> occurences;


is located at the bottom of the class. Generally, instance variables are found at the top instead so that you can see directly what the class has as instance variables.

You're currently using two classes: one for the main part and one to find the occurences. It introduces a problem: the constructor does too much work. In fact, the constructor of Index does all the work. It would be better to refactor this into a method properly named after what it does. We could introduce a method populateOccurences whose goal would be to create the occurences map.

Also, I don't think the Index class is really that necessary: the more a code is simple, the better it is to maintain it. In this case, this class really contains a single method, which to populate the occurences map. It would be easier to not have that class and simply have a method

private static Map> getOccurencesMap(Reader text) throws IOException


inside the main class that would return the map.

Also, don't name your variables index_str: use camel-case, as indexStr.

Handling exceptions

When you're reading a text from a file, you're not directly catching the FileNotFoundException, instead you're letting the main method do it:

try {
    BufferedReader br;
    if (args.length == 0) {
        br = new BufferedReader(new InputStreamReader(System.in));
    } else {
        br = new BufferedReader(new FileReader(args[0]));
    }
    // ...
} catch (FileNotFoundException e) {
    e.printStackTrace();
}


This create a coupling between the method and what it reads from. Instead, it would be best to delegate that to a method dedicated to returning the Reader to read:

private static Reader getReader(String[] args) {
    if (args.length == 0) {
        return new BufferedReader(new InputStreamReader(System.in));
    } else {
        try {
            return new BufferedReader(new FileReader(args[0]));
        } catch (FileNotFoundException e) {
            throw new IllegalArgumentException("The given file does not exist.", e);
        }
    }
}


Note two things:

  • The catch (FileNotFoundException e) is done inside the else part: that is the only part of the code responsible for reading a file, so it must be the only part of the code for handling a FileNotFoundException.



  • A custom IllegalArgumentException is re-thrown to indicate that the file wasn't found. This runtime exception wraps the initial FileNotFoundException to have a proper stacktrace but it hides that from the surrounding code.



Lowercasing Strings

Be very careful when lowercasing / uppercasing Strings in Java. This depends on the locale. By default, Java will use the locale of the current JVM, which is your system locale (by default). If you were to read a Turkish text on a server in France, you might have inconsistencies and hard to understand bugs! It is preferable to use a locale when doing those operations

word = word.toLowerCase(Locale.ROOT);


Using Java 8 constructs

Your code updating the Map holding the line numbers for each word reads line

ArrayList list = occurences.get(word);
if (list == null) {
    list = new ArrayList<>();
    list.add(lineNo);
} else {
    list.add(lineNo);
}
occurences.put(word, list);


Let alone the fact that you could drop the else clause and have list.add(lineNo); after the if (which would remove this little duplication), you could use the method computeIfAbsent that will get the value for a specified key or if there is no value, set it with an initial value based on the given mapping function. In this case, you can simply have

occurences.computeIfAbsent(word, k -> new ArrayList<>()).add(lineNo);


If the current word is not in the map, a new ArrayList will be created and returned, otherwise the current list for that word will be returned. Then, on this instance, we add the current line number.

Beginning with Java 8, a BufferedReader also has a useful lines() method that returns a Stream of the lines. Instead of looping with a for, w

Code Snippets

try (Scanner sc = new Scanner(text)) {
    // ...
}
private Map<String, ArrayList<Integer>> occurences;
private static Map<String, List<Integer>> getOccurencesMap(Reader text) throws IOException
try {
    BufferedReader br;
    if (args.length == 0) {
        br = new BufferedReader(new InputStreamReader(System.in));
    } else {
        br = new BufferedReader(new FileReader(args[0]));
    }
    // ...
} catch (FileNotFoundException e) {
    e.printStackTrace();
}
private static Reader getReader(String[] args) {
    if (args.length == 0) {
        return new BufferedReader(new InputStreamReader(System.in));
    } else {
        try {
            return new BufferedReader(new FileReader(args[0]));
        } catch (FileNotFoundException e) {
            throw new IllegalArgumentException("The given file does not exist.", e);
        }
    }
}

Context

StackExchange Code Review Q#122675, answer score: 11

Revisions (0)

No revisions yet.