patternjavascriptMinor
Count frequency of words in a string returning list of unique words sorted by frequency in descending order
Viewed 0 times
uniqueorderdescendingfrequencywordsreturningsortedcountliststring
Problem
I have the following solution but I am trying to optimize it to run in O(n) time, any thoughts?
I first split the string removing any non-alphanumeric characters, then split the words using spaces and then sort the new string. The rest is just looping to get the duplicate words and its count which I sort one last time to have the most frequently used word on top of the list.
This function also takes care of empty spaces, and limits the numofwords parameter to the total number of words, to avoid any unexpected results.
I first split the string removing any non-alphanumeric characters, then split the words using spaces and then sort the new string. The rest is just looping to get the duplicate words and its count which I sort one last time to have the most frequently used word on top of the list.
This function also takes care of empty spaces, and limits the numofwords parameter to the total number of words, to avoid any unexpected results.
function getFrequency(mystring, numofwords) {
mystring = mystring.replace(/[\.,-\/#!$%\^&\*;:{}=\-_`~()]/g,"");
var words = mystring.split(' '),
sortedWords = words.sort(),
uniqueWords = [],
d = {},
wordcount = 1,
result = [];
for (var i=0; i<sortedWords.length; i++) {
var currentword = sortedWords[i];
if(sortedWords[i+1] === currentword) {
wordcount++;
}
if(!d[currentword]) {
d[currentword] = true;
uniqueWords.push({word:currentword, count:wordcount});
}
}
uniqueWords = uniqueWords.slice(0,numofwords).sort(function (a,b) { return b.count-a.count;});
for(i=0; i<uniqueWords.length; i++) {
result.push(uniqueWords[i].word);
}
return result.toString();
}Solution
From a once over:
This would be my approach:
words = cleanString.split(' '),
frequencies = {},
word, frequency, i;
for( i=0; i
- It seems your code is giving the words with fewest occurrences, consider switching
b.countanda.count
- I am not vehemently against regexes, but if you use them, I would advise 2 lines of comments, one with the intent of the regex, and one with an example of from -> to
- Naming,
numofwords-> should benumOfWordsor since you only use it once you might as well go fornumberOfWordsand personally I called itcutOff
- Not sure why you named the string
myString, I would just call itstringors
- It seems that sorting the words in advance will require more code to process the words and it is not guaranteed to be faster, I would not do that
- From a data structure perspective, I would go for an object, where each word is a property of the object, and the value is the count, this way you can skip the build
- If you are going to
sortandslice, then perhaps you should FIRST sort and SECOND slice ;)
This would be my approach:
function getFrequency2(string, cutOff) {
var cleanString = string.replace(/[\.,-\/#!$%\^&\*;:{}=\-_~()]/g,""),words = cleanString.split(' '),
frequencies = {},
word, frequency, i;
for( i=0; i
Context
StackExchange Code Review Q#63947, answer score: 5
Revisions (0)
No revisions yet.