patterncsharpMinor
Counting occurrences of each word in a sentence
Viewed 0 times
countingeachoccurrenceswordsentence
Problem
I have been doing this test task for C# Developer role at UBS.London recently and have been rejected. No reason stated yet. Was wondering if you can take a look and provide feedback on what I can do to improve solution. The task description is below and solution itself is on GitHub. Also attached code for two main classes I have implemented. Unit tests are in solution. No UI provided as I thought it is not required as Unit tests are self explanatory.
Hope it will be helpful for someone else doing interviews for C# Dev roles.
Exercise description
Create a .net application that will solve the following problem. There
are no time constraints and you are free to use any resources at your
disposal.
The Problem As an author I want to know the number of times each word
appears in a sentence So that I can make sure I'm not repeating myself
Acceptance Criteria Given a sentence When the program is run Then I'm
returned a distinct list of words in the sentence and the number of
times they have occurred
Example Input: "This is a statement, and so is this." Output: this - 2
is – 2 a – 1 statement – 1 and – 1 so - 1
Solution is on GitHub
I am also using Netfx-Guard library from Nuget
Solutions contains two projects: main project with
```
using System.Collections.Generic;
using System.Linq;
using UBS.TextParsing.Interfaces;
namespace UBS.TextParsing
{
public class SentenceParser
{
private IStringSplitter _splitter;
public SentenceParser(IStringSplitter splitter)
{
_splitter = splitter;
}
public IDictionary Parse(string sentence)
{
Guard.NotNullOrEmpty(() => sentence, sentence);
var words = _splitter.SplitIntoWords(sentence);
var wordsCounts = from word in words
group word by word.ToLower()
Hope it will be helpful for someone else doing interviews for C# Dev roles.
Exercise description
Create a .net application that will solve the following problem. There
are no time constraints and you are free to use any resources at your
disposal.
The Problem As an author I want to know the number of times each word
appears in a sentence So that I can make sure I'm not repeating myself
Acceptance Criteria Given a sentence When the program is run Then I'm
returned a distinct list of words in the sentence and the number of
times they have occurred
Example Input: "This is a statement, and so is this." Output: this - 2
is – 2 a – 1 statement – 1 and – 1 so - 1
Solution is on GitHub
I am also using Netfx-Guard library from Nuget
Solutions contains two projects: main project with
SentenceParser and StringSplitter classes and unit testing projects.SentenceParser and StringSplitter classes are below.```
using System.Collections.Generic;
using System.Linq;
using UBS.TextParsing.Interfaces;
namespace UBS.TextParsing
{
public class SentenceParser
{
private IStringSplitter _splitter;
public SentenceParser(IStringSplitter splitter)
{
_splitter = splitter;
}
public IDictionary Parse(string sentence)
{
Guard.NotNullOrEmpty(() => sentence, sentence);
var words = _splitter.SplitIntoWords(sentence);
var wordsCounts = from word in words
group word by word.ToLower()
Solution
Guard.NotNullOrEmpty(() => sentence, sentence);This introduces a dependency on a third-party library, which is 100 lines. Why not just write
if (string.IsNullOrEmpty(sentence))
{
throw new ...
}The
ISplitter interface is overkill. As is the StringSplitter class, probably -- as @NickUdell mentioned, a regular expression looks like the right tool for the job here.You have the core of the solution right here:
var words = _splitter.SplitIntoWords(sentence);
var wordsCounts = from word in words
group word by word.ToLower()
into g
select new {Word = g.Key, Count = g.Count()};
return wordsCounts.ToDictionary(wc => wc.Word, wc => wc.Count);Which can be simplified to
var words = _splitter.SplitIntoWords(sentence.ToLower());
return words.GroupBy(word => word)
.ToDictionary(group => group.Key, group => group.Count());Besides being a little shorter, we are creating fewer objects. Let's say there are \$n\$ words returned by
SplitIntoWords, \$d\$ of which are distinct (up to case). The first version creates \$2n\$ string objects (the \$n\$ returned by SplitIntoWords and these same \$n\$ words lower-cased) and \$d\$ instances of the anonymous type. The second version creates \$n + 1\$ string objects. This might not be a concern, but since it's an easy win, we should just take it.SplitIntoWords could be written using yield. Again, not a big difference but it makes the code a bit more re-usable, since client code can use Take, First, etc without having to have generate every word in the string.public IEnumerable SplitIntoWords(string sentence)
{
var word = new StringBuilder();
foreach (var chr in sentence)
{
if (Char.IsPunctuation(chr) || Char.IsSeparator(chr) || Char.IsWhiteSpace(chr))
{
if (word.Length > 0)
{
yield return word.ToString();
word.Clear();
}
}
else
{
word.Append(chr);
}
}
if (word.Length > 0)
{
yield return word.ToString();
}
}Code Snippets
Guard.NotNullOrEmpty(() => sentence, sentence);if (string.IsNullOrEmpty(sentence))
{
throw new ...
}var words = _splitter.SplitIntoWords(sentence);
var wordsCounts = from word in words
group word by word.ToLower()
into g
select new {Word = g.Key, Count = g.Count()};
return wordsCounts.ToDictionary(wc => wc.Word, wc => wc.Count);var words = _splitter.SplitIntoWords(sentence.ToLower());
return words.GroupBy(word => word)
.ToDictionary(group => group.Key, group => group.Count());public IEnumerable<string> SplitIntoWords(string sentence)
{
var word = new StringBuilder();
foreach (var chr in sentence)
{
if (Char.IsPunctuation(chr) || Char.IsSeparator(chr) || Char.IsWhiteSpace(chr))
{
if (word.Length > 0)
{
yield return word.ToString();
word.Clear();
}
}
else
{
word.Append(chr);
}
}
if (word.Length > 0)
{
yield return word.ToString();
}
}Context
StackExchange Code Review Q#68859, answer score: 9
Revisions (0)
No revisions yet.