HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavascriptMinor

Count frequency of words in a string returning list of unique words sorted by frequency in descending order

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
uniqueorderdescendingfrequencywordsreturningsortedcountliststring

Problem

I have the following solution but I am trying to optimize it to run in O(n) time, any thoughts?

I first split the string removing any non-alphanumeric characters, then split the words using spaces and then sort the new string. The rest is just looping to get the duplicate words and its count which I sort one last time to have the most frequently used word on top of the list.

This function also takes care of empty spaces, and limits the numofwords parameter to the total number of words, to avoid any unexpected results.

function getFrequency(mystring, numofwords) {
    mystring = mystring.replace(/[\.,-\/#!$%\^&\*;:{}=\-_`~()]/g,"");
    var words = mystring.split(' '),
    sortedWords = words.sort(),
    uniqueWords = [],
    d = {},
    wordcount = 1,
    result = [];

    for (var i=0; i<sortedWords.length; i++) {

        var currentword = sortedWords[i];

        if(sortedWords[i+1] === currentword) {
            wordcount++;
        }

        if(!d[currentword]) {
            d[currentword] = true;
            uniqueWords.push({word:currentword, count:wordcount});
        }   
    }

uniqueWords = uniqueWords.slice(0,numofwords).sort(function (a,b) { return b.count-a.count;});  

for(i=0; i<uniqueWords.length; i++) {

    result.push(uniqueWords[i].word);
}

return result.toString();
}

Solution

From a once over:

  • It seems your code is giving the words with fewest occurrences, consider switching b.count and a.count



  • I am not vehemently against regexes, but if you use them, I would advise 2 lines of comments, one with the intent of the regex, and one with an example of from -> to



  • Naming, numofwords -> should be numOfWords or since you only use it once you might as well go for numberOfWords and personally I called it cutOff



  • Not sure why you named the string myString, I would just call it string or s



  • It seems that sorting the words in advance will require more code to process the words and it is not guaranteed to be faster, I would not do that



  • From a data structure perspective, I would go for an object, where each word is a property of the object, and the value is the count, this way you can skip the build



  • If you are going to sort and slice, then perhaps you should FIRST sort and SECOND slice ;)



This would be my approach:

function getFrequency2(string, cutOff) {
var cleanString = string.replace(/[\.,-\/#!$%\^&\*;:{}=\-_
~()]/g,""),
words = cleanString.split(' '),
frequencies = {},
word, frequency, i;

for( i=0; i

Context

StackExchange Code Review Q#63947, answer score: 5

Revisions (0)

No revisions yet.