HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaModerate

Count frequency of words in a given file

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filefrequencywordscountgiven

Problem

Write a function that takes two parameters:



  • a String representing a text document



  • an integer providing the number of items to return





Implement the function such that it returns a list of Strings ordered
by word frequency, the most frequently occurring word first.


Runtime not to exceed \$O(n)\$

I decided to use a map to count the frequency of the words and then do the count sort to achieve sorting in linear time. Is this the best approach or there is some other solution that I am not aware of?

```
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;

interface IFrequencyCounter{
public String[] getMostFrequenctWords(String content, int numberOfwords );
}

public class FrequencyCounter implements IFrequencyCounter{

/**
* This is the Frequency Class
* which has only one property count.
* This is used to save autoboxing/unboxing
* overhead when using maps.
*/
private class Frequency{

private Frequency(){
this.count = 1;
}

private int count;

public int getFrequency(){
return this.count;
}

public void incrementFrequency(){
this.count++;
}

public void setCount(int count){
this.count = count;
}
}

/**
* Token class extends Frequency. This is
* used to maintain word frequency relationship.
* (Just like a pair.)
*/
private class Token extends Frequency{

private Token(String word, int count){
super();
this.word = word;
setCount(count);
}

private String word;
}

/**
* This method returns String array sorted by most frequent word. If null/empty string
* is provided as input then empty array is returned. If numberOfWords is greater then
* unique words then all the words are returned.
*
* @see com.evernote.util.IFrequencyCounter#getMostFrequenctWords(java.lang.String, int)
*/
@Override
public String[] getMostFrequenctWords(String content, int numberOfwords) {
// basic validations
if( null == content) content = "";
if(numberOfwords wordMap = new HashMap()

Solution

There are a number of things in here you have done well, and your algorithm is good. I like that you have created an object to contain the frequency. Using the HashMap is the right solution too.

Overall, the OOP, and algorithm is good, but.... the presentation is messy, and there's a couple of things you can do better.

The indentation and code style in general are very inconsistent. I imagine you are using an IDE to write your code (based on the boilerplate JavaDoc comments), so, there is no real excuse for the inconsistent indentation. Maybe you had a problem pasting it in to Code Review... (I see this is your first question). There's a trick in Code Review (and all Stack Exchange sites). When editing your question, you can select the code, and click the "code" button, or press Ctrl-k to indent things 4 spaces to get the formatting right.

Code Style

The indentation is a problem. Part of it is probably the pasting and manual fix up you may have done. But, even then, there are some lines indented one space more than I expect.

The bigger issue is the consistency and conformance-to-standards of where you declare your variables.

It is standard in Java for the class to have the class declaration, static fields, static methods, instance fields, the constructor, then public methods, finally private methods. There are some variations, but, the private variables should always be declared before the constructor....

While we are looking at this class, the variable should be final, and there is no need for the call to super() in the constructor.

The code:

private class Token extends Frequency{

private Token(String word, int count){
    super();
    this.word = word;
    setCount(count);
}

private String word;
}


should be:

private class Token extends Frequency{

    private String word;

    private Token(String word, int count){
        this.word = word;
        setCount(count);
    }

}


Frequency

The good thing about Frequency is that you are using it. You are correct when you say it is better than the autoboxing system (keeping an Integer in the HashMap). I have to question why you felt it was necessary to add the intermediate class Token though. There is no need for them both. Choose one, and merge the other in to it. For example, there is no reason why you can't add the String value to the Frequency class. As it stands, the logic is a little bit too abstracted.

Map efficiency

The code

for (String string : contentArray) {
    if(wordMap.containsKey(string)){
        wordMap.get(string).incrementFrequency();
    }else {
        Frequency token = new Frequency();
        wordMap.put(string,token);
    }
    } 
  }


is not doing efficient HashMap lookups. Consider rewriting it like (only one lookup to the map):

for (String string : contentArray) {
    Frequency token = wordMap.get(string);
    if(token == null) {
        wordMap.put(string,new Frequency());
    }else {
        token.incrementFrequency();
    }
}


Other....

  • The variable int maxSofar = 0; and the maxSoFar parameter is useless, just initialize maxSoFar inside the getSortedArray method.



  • Using a variable name string is a bad idea.

Code Snippets

private class Token extends Frequency{

private Token(String word, int count){
    super();
    this.word = word;
    setCount(count);
}

private String word;
}
private class Token extends Frequency{

    private String word;

    private Token(String word, int count){
        this.word = word;
        setCount(count);
    }

}
for (String string : contentArray) {
    if(wordMap.containsKey(string)){
        wordMap.get(string).incrementFrequency();
    }else {
        Frequency token = new Frequency();
        wordMap.put(string,token);
    }
    } 
  }
for (String string : contentArray) {
    Frequency token = wordMap.get(string);
    if(token == null) {
        wordMap.put(string,new Frequency());
    }else {
        token.incrementFrequency();
    }
}

Context

StackExchange Code Review Q#48198, answer score: 15

Revisions (0)

No revisions yet.