HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Count chars, words and lines

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
wordsandcountcharslines

Problem

Any suggestions are welcome.

package je3.io;

import java.io.*;
import java.nio.charset.Charset;

/**
 * Created by IDEA on 30/01/15.
 */
public class CountCharsWordsLines {
    public static long[] count(String filename) throws IOException {
        long[] counts = new long[3];
        try(InputStream in = new FileInputStream(filename);
            Reader reader = new InputStreamReader(in, Charset.forName("UTF-8"));
            Reader buffer = new BufferedReader(reader)
        ) {
            int c;
            String newLineChar = System.lineSeparator();
            while((c = reader.read()) != -1) {
                char character = (char) c;
                if(character == '\n') {
                    counts[2]++;
                    counts[1]++;
                    counts[0]++;
                } else if(Character.isWhitespace(character)) {
                    counts[1]++;
                    counts[0]++;
                } else {
                    counts[0]++;
                }
            }
        }
        return counts;
    }

    public static void main(String[] args) throws IOException {
        if(args.length != 1) {
            throw new IllegalArgumentException();
        }
        long[] counts = count(args[0]);
        System.out.printf("Chars: %d; Words: %d; Lines: %d;\n", counts[0], counts[1], counts[2]);
    }
}

Solution

Bug

The program doesn't count words correctly,
for example in "a text like this",
because it counts every whitespace character as a word,
without handling consecutive whitespace characters.

Reading character by character is inefficient

Reading a file character by character is not very efficient.
It would be better to use a buffered reader and process the file line by line.
Here's an example of doing that, which also solves the bug with word counting:

long[] counts = new long[3];
try(InputStream in = new FileInputStream(filename);
    BufferedReader reader = new BufferedReader(new InputStreamReader(in, Charset.forName("UTF-8")))
) {
    String line;
    while ((line = reader.readLine()) != null) {
        counts[2]++;
        counts[1] += line.split("\\s+").length;
        counts[0] += line.length() + 1;
    }
}
return counts;


This might still not be accurate,
and probably there are corner cases where it will not count words correctly,
but it's still more accurate than the original.

Encapsulation

The implementation doesn't encapsulate well the results.
It returns a long[],
which is not great because you have to remember the array indexes that correspond to characters, words, lines, which is error prone.
Even if you add constants like CHARS_INDEX = 0, WORDS_INDEX = 1 to eliminate magic numbers,
it will not solve the encapsulation problem:
users of the function will know too much about the internal implementation,
that it uses a long[] for storage.
It would be cleaner to create a dedicated CountResult class to encapsulate the results.

For example:

private static class CountResult {
    private final long chars;
    private final long words;
    private final long lines;

    private CountResult(long chars, long words, long lines) {
        this.chars = chars;
        this.words = words;
        this.lines = lines;
    }
}


And then use it like this:

long chars = 0;
long words = 0;
long lines = 0;
while ((line = reader.readLine()) != null) {
    lines++;
    words += line.split("\\s+").length;
    chars += line.length() + 1;
}
return new CountResult(chars, words, lines);


Callers will not need to remember indexes,
the counts will be accessible intuitively by names.

Unused variables

You have some unused variables:

Reader buffer = new BufferedReader(reader)

String newLineChar = System.lineSeparator();


It would be better to remove them.

Code Snippets

long[] counts = new long[3];
try(InputStream in = new FileInputStream(filename);
    BufferedReader reader = new BufferedReader(new InputStreamReader(in, Charset.forName("UTF-8")))
) {
    String line;
    while ((line = reader.readLine()) != null) {
        counts[2]++;
        counts[1] += line.split("\\s+").length;
        counts[0] += line.length() + 1;
    }
}
return counts;
private static class CountResult {
    private final long chars;
    private final long words;
    private final long lines;

    private CountResult(long chars, long words, long lines) {
        this.chars = chars;
        this.words = words;
        this.lines = lines;
    }
}
long chars = 0;
long words = 0;
long lines = 0;
while ((line = reader.readLine()) != null) {
    lines++;
    words += line.split("\\s+").length;
    chars += line.length() + 1;
}
return new CountResult(chars, words, lines);
Reader buffer = new BufferedReader(reader)

String newLineChar = System.lineSeparator();

Context

StackExchange Code Review Q#79178, answer score: 6

Revisions (0)

No revisions yet.