HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Find most occurring word in a txt file

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
fileoccurringwordfindtxtmost

Problem

Assume that we have a .txt file that has one word per line.
Find out the word that occurs the most.

Here's what I was able to write (I used array of strings instead of a file in this example):

string[] source = { "test1", "test2", "test3", "test4", "test1", "test1", "test3" };
Dictionary dic = source.Distinct().ToDictionary(p => p, p => 0);
var keys = new List(dic.Keys);
foreach (string key in keys)
{
  dic[key]=source.Count(f => f == key);
}
int max = dic.Values.Max();
foreach (KeyValuePair kvp in dic)
{
    if (kvp.Value == max)
    {
        Console.WriteLine(kvp.Key + " " + max);
        break;
    }
}


Questions:

  • Can this be done better and more efficient way (speed/ memory)?



  • What if file size is 10GB. How would you do it differently from my approach?

Solution

You are trying to count each key separately. This means you need to iterate through the entire list to count each key. Instead you can keep a running total of your key's and only have to iterate through your list once:

string[] source = { "test1", "test2", "test3", "test4", "test1", "test1", "test3" };
Dictionary dic = new Dictionary();

foreach(string s in source){
    if(dic.Keys.Contains(s))
         dic[s] = dic[s]++;
    else
       dic.Add(s, 1);
}


EDIT: I did not include getting the max value as what you have works for that and has already been re-written by thantos

Code Snippets

string[] source = { "test1", "test2", "test3", "test4", "test1", "test1", "test3" };
Dictionary<string, int> dic = new Dictionary<string, int>();

foreach(string s in source){
    if(dic.Keys.Contains(s))
         dic[s] = dic[s]++;
    else
       dic.Add(s, 1);
}

Context

StackExchange Code Review Q#11254, answer score: 6

Revisions (0)

No revisions yet.