patternModerate
Word Frequency with Ordering in O(n) Complexity
Viewed 0 times
frequencywithorderingwordcomplexity
Problem
During an interview for a Java developer position, I was asked the following:
Write a function that takes two params:
Implement the function such that it returns a list of Strings ordered by word frequency, the most frequently occurring word first. Your solution should run in $O(n)$ time where $n$ is the number of characters in the document.
The following is what I answered (in pseudocode), it is not $O(n)$ but rather $O(n \log n)$ time because of the sort. I cannot figure out how to do it $O(n)$ time.
Does someone know or can someone give me some hints?
Write a function that takes two params:
- a String representing a text document and
- an integer providing the number of items to return.
Implement the function such that it returns a list of Strings ordered by word frequency, the most frequently occurring word first. Your solution should run in $O(n)$ time where $n$ is the number of characters in the document.
The following is what I answered (in pseudocode), it is not $O(n)$ but rather $O(n \log n)$ time because of the sort. I cannot figure out how to do it $O(n)$ time.
wordFrequencyMap = new HashMap();
words = inputString.split(' ');
for (String word : words) {
count = wordFrequencyMap.get(word);
count = (count == null) ? 1 : ++count;
wordFrequencyMap.put(word, count);
}
return wordFrequencyMap.sortByValue.keysDoes someone know or can someone give me some hints?
Solution
I suggest a variation of distribution counting:
You can probably replace the trie by other data structures in the first phase.
- Read the text and insert all the word encountered into a trie, maintaining in each node a count, how often the word represented by this node has occured. Additionally keep track of the highest word count say
maxWordCound. -- $O(n)$
- Initialize an array of size
maxWordCount. Entry type are lists of strings. -- $O(n)$, since the count can't be higher.
- Traverse the trie and for each node add the corresponding string to the array entry indicated by the count. -- $O(n)$, since the total length of strings is bounded by $n$.
- Traverse the array in descending order and output the desired number of strings. -- $O(n)$, since that is a bound on both the size of and the amount of data in the array.
You can probably replace the trie by other data structures in the first phase.
Context
StackExchange Computer Science Q#26427, answer score: 13
Revisions (0)
No revisions yet.