HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Counting occurrences of each word in a sentence

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
countingeachoccurrenceswordsentence

Problem

I have been doing this test task for C# Developer role at UBS.London recently and have been rejected. No reason stated yet. Was wondering if you can take a look and provide feedback on what I can do to improve solution. The task description is below and solution itself is on GitHub. Also attached code for two main classes I have implemented. Unit tests are in solution. No UI provided as I thought it is not required as Unit tests are self explanatory.

Hope it will be helpful for someone else doing interviews for C# Dev roles.

Exercise description


Create a .net application that will solve the following problem. There
are no time constraints and you are free to use any resources at your
disposal.


The Problem As an author I want to know the number of times each word
appears in a sentence So that I can make sure I'm not repeating myself


Acceptance Criteria Given a sentence When the program is run Then I'm
returned a distinct list of words in the sentence and the number of
times they have occurred


Example Input: "This is a statement, and so is this." Output: this - 2
is – 2 a – 1 statement – 1 and – 1 so - 1

Solution is on GitHub

I am also using Netfx-Guard library from Nuget

Solutions contains two projects: main project with SentenceParser and StringSplitter classes and unit testing projects.

SentenceParser and StringSplitter classes are below.

```
using System.Collections.Generic;
using System.Linq;
using UBS.TextParsing.Interfaces;

namespace UBS.TextParsing
{
public class SentenceParser
{
private IStringSplitter _splitter;

public SentenceParser(IStringSplitter splitter)
{
_splitter = splitter;
}

public IDictionary Parse(string sentence)
{
Guard.NotNullOrEmpty(() => sentence, sentence);

var words = _splitter.SplitIntoWords(sentence);

var wordsCounts = from word in words
group word by word.ToLower()

Solution

Guard.NotNullOrEmpty(() => sentence, sentence);


This introduces a dependency on a third-party library, which is 100 lines. Why not just write

if (string.IsNullOrEmpty(sentence))
{
    throw new ...
}


The ISplitter interface is overkill. As is the StringSplitter class, probably -- as @NickUdell mentioned, a regular expression looks like the right tool for the job here.

You have the core of the solution right here:

var words = _splitter.SplitIntoWords(sentence);

       var wordsCounts = from word in words
           group word by word.ToLower()
           into g
           select new {Word = g.Key, Count = g.Count()};

       return wordsCounts.ToDictionary(wc => wc.Word, wc => wc.Count);


Which can be simplified to

var words = _splitter.SplitIntoWords(sentence.ToLower());
return words.GroupBy(word => word)
            .ToDictionary(group => group.Key, group => group.Count());


Besides being a little shorter, we are creating fewer objects. Let's say there are \$n\$ words returned by SplitIntoWords, \$d\$ of which are distinct (up to case). The first version creates \$2n\$ string objects (the \$n\$ returned by SplitIntoWords and these same \$n\$ words lower-cased) and \$d\$ instances of the anonymous type. The second version creates \$n + 1\$ string objects. This might not be a concern, but since it's an easy win, we should just take it.

SplitIntoWords could be written using yield. Again, not a big difference but it makes the code a bit more re-usable, since client code can use Take, First, etc without having to have generate every word in the string.

public IEnumerable SplitIntoWords(string sentence)
{
    var word = new StringBuilder();
    foreach (var chr in sentence)
    {
        if (Char.IsPunctuation(chr) || Char.IsSeparator(chr) || Char.IsWhiteSpace(chr))
        {
            if (word.Length > 0)
            {
                yield return word.ToString();
                word.Clear();
            }
        }
        else
        {
            word.Append(chr);
        }
    }

    if (word.Length > 0)
    {
        yield return word.ToString();
    }
}

Code Snippets

Guard.NotNullOrEmpty(() => sentence, sentence);
if (string.IsNullOrEmpty(sentence))
{
    throw new ...
}
var words = _splitter.SplitIntoWords(sentence);

       var wordsCounts = from word in words
           group word by word.ToLower()
           into g
           select new {Word = g.Key, Count = g.Count()};

       return wordsCounts.ToDictionary(wc => wc.Word, wc => wc.Count);
var words = _splitter.SplitIntoWords(sentence.ToLower());
return words.GroupBy(word => word)
            .ToDictionary(group => group.Key, group => group.Count());
public IEnumerable<string> SplitIntoWords(string sentence)
{
    var word = new StringBuilder();
    foreach (var chr in sentence)
    {
        if (Char.IsPunctuation(chr) || Char.IsSeparator(chr) || Char.IsWhiteSpace(chr))
        {
            if (word.Length > 0)
            {
                yield return word.ToString();
                word.Clear();
            }
        }
        else
        {
            word.Append(chr);
        }
    }

    if (word.Length > 0)
    {
        yield return word.ToString();
    }
}

Context

StackExchange Code Review Q#68859, answer score: 9

Revisions (0)

No revisions yet.