patterncppMinor
Max repeated word in a string
Viewed 0 times
wordmaxstringrepeated
Problem
I was trying to do a very common interview problem "Finding the max repeated word in a string " and could not find much resources in net for c/c++ implementation. So I coded it myself here. I have tried to do most of the coding from scratch for better understanding. Could you review my code and provide comments on my algorithm. Some people have suggested using hashtables for storing the count, but am not using hashtables here.
#include
#include
#include
#include
#include
using namespace std;
string word[10];
//splitting string into words
int parsestr(string str)
{
int index = 0;
int i = 0;
int maxlength = str.length();
int wordcnt = 0;
while(i max)
{
max = wordcntArr[i];
index = i;
}
}
return word[index];
}
void countwords(int count)
{
int wordcnt = 0;
int wordcntArr[10];
string maxrepeatedword;
for(int i = 0; i <= count; i++)
{
for(int j = 0; j <= count; j++)
{
if(word[i] == word[j])
{
wordcnt++;
//word[j] = "";
}
else
{}
}
cout << " word " << word[i] << " occurs " << wordcnt << " times " << endl;
wordcntArr[i] = wordcnt;
wordcnt = 0;
}
maxrepeatedword = maxrepeatedWord(wordcntArr,count);
cout << " Max Repeated Word is " << maxrepeatedword;
}
int main()
{
string str = "I am am am good good";
int wordcount = 0;
wordcount = parsestr(str);
countwords(wordcount);
}Solution
Here's a long list of feedback. Sorry for the blunt tone, but I'm trying to be brief:
-
that variable (the count). That's inconsistent. Either have it return both
the count and the array (or make things easier on yourself and use a
-
You don't handle punctuation characters. Is that a requirement? Should
"a(b)" be one word? Should "a. a" be two unique words?
-
Do you need to split on characters other than space? What about
-
Your
-
-
You're passing
to a global variable that it accesses directly. Why not just make the count
global too?
-
The
-
If you declare
reinitialize it each iteration.
-
constant or, better, a dynamically-sized container.
-
You're redundantly recounting each word every time it occurs. With
If you had a collection of unique words instead, your \$O(n^2)\$ complexity
would go down to \$O(mn)\$ where \$m\$ is the number of unique words.
A hash table would be a good way to create the set of unique words.
-
Your test string assumes repeated words will be contiguous (as opposed to,
say
At a high level, your algorithm is also not optimal. You've got \$O(n^2)\$ performance, ignoring the dynamic allocation as you incrementally build up those strings. With that, it gets worse.
I believe the canonical solution to this is (roughly):
That gives you \$O(n)\$: you only walk the entire string once.
-
parsestr mutates a global variable but returns a value that's relevant to that variable (the count). That's inconsistent. Either have it return both
the count and the array (or make things easier on yourself and use a
vector that knows its size), or make the count global too.-
You don't handle punctuation characters. Is that a requirement? Should
"a(b)" be one word? Should "a. a" be two unique words?
-
Do you need to split on characters other than space? What about
\t or \n? Other Unicode whitespace characters?-
Your
while(i
-
Building up the word by incrementally contatenating characters onto a
string is slow. It will periodically require dynamic allocation. A more
efficient solution would be to remember the start index of the word. When
you reach the end, create a string from the substring (start, end) in one
step.
Going further from there, there's no reason to even store those words
separately since that start, end pair is enough to reconstruct it as needed.
-
Your program will mysteriously crash if you pass a sentence with more than
ten words. :( Since you're using dynamic allocation everywhere else (string
does that under the hood) there's no reason to use a fixed array for words.
At the very least, you should have a constant for that size and check that
you don't overflow the array.
-
index isn't a helpful variable name. Call it wordIndex.
-
Likewise, wordcnt would be better as wordCount.
-
maxrepeatedWord is a mixture of camelCase and all lowercase. Be consistent
(and camelCase is generally better since it makes it easier for the reader
to split it into words).
-
wordcntArr would be better as wordCounts. The fact that it's an array is
self-evident.
-
i -
index -> indexOfMax.-
You're passing
count into countwords even though that count is relevant to a global variable that it accesses directly. Why not just make the count
global too?
-
The
else {} accomplishes nothing. Remove it.-
If you declare
wordcnt inside the for(int i... loop, you won't have toreinitialize it each iteration.
-
int wordcntArr[10] again duplicates the magical literal 10. Use a constant or, better, a dynamically-sized container.
-
You're redundantly recounting each word every time it occurs. With
"I am am am good good", you'll get a count array like 1 3 3 3 2 2. If you had a collection of unique words instead, your \$O(n^2)\$ complexity
would go down to \$O(mn)\$ where \$m\$ is the number of unique words.
A hash table would be a good way to create the set of unique words.
-
Your test string assumes repeated words will be contiguous (as opposed to,
say
"I am good am good am."). Is that intentional? Desirable?At a high level, your algorithm is also not optimal. You've got \$O(n^2)\$ performance, ignoring the dynamic allocation as you incrementally build up those strings. With that, it gets worse.
I believe the canonical solution to this is (roughly):
create a map of words -> counts
set wordStart to 0
iterate through the string
if the current character is a word delimiter
set word to the substring from wordStart to here
set wordStart to here
if map contains word
increment the count for that word
else
add the word to the map with a count of 1
end
end
end
set maxWord to null
set max to 0
iterate through the map
if count for this word > max
set maxWord to this word
set max to count
end
end
return maxWordThat gives you \$O(n)\$: you only walk the entire string once.
Code Snippets
create a map of words -> counts
set wordStart to 0
iterate through the string
if the current character is a word delimiter
set word to the substring from wordStart to here
set wordStart to here
if map contains word
increment the count for that word
else
add the word to the map with a count of 1
end
end
end
set maxWord to null
set max to 0
iterate through the map
if count for this word > max
set maxWord to this word
set max to count
end
end
return maxWordContext
StackExchange Code Review Q#1394, answer score: 7
Revisions (0)
No revisions yet.