HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Split string to space-separated words based on dictionary

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
spacewordsseparatedsplitbaseddictionarystring

Problem

This program is based on as old retired Google interview question where you're supposed to create a program that will split a string into space-separated words, if the string is composed of valid words.

I've implemented two approaches, one that uses a normal array as the dictionary, making the implementation pretty simple, and one that uses a trie to store the dictionary, which will save memory but make the algorithm a bit more complicated. That being said, both implementations follow pretty much the same approach. Also, I'm not interested in a comparison of the two implementations i've written, I just wanted to try my hands on two different approaches.

Let's for fun's sake imagine that performance matters, so I'm looking for feedback on both performance, readability and whatever else you think should be different. Thanks!

```
using System;
using System.Collections.Generic;
using System.Linq;

class MainClass {
// The dictionary to check the words against. Normally this would be loaded from a file.
// If the dictionary is too big, loading it directly from the file into a trie would be better.
private static string[] dictArray = new string[] {
"all",
"base",
"ball",
"baseball",
"game",
"ballgame" // Not a real word, used for testing.
};

public static void Main () {
// Print the result of a couple of tests. If the test returns null, print "null" instead.
Console.WriteLine("Array dictionary:");
Console.WriteLine(SplitWordWithArray("baseballbase", dictArray) ?? "null"); // "baseball base"
Console.WriteLine(SplitWordWithArray("gamebaseball", dictArray) ?? "null"); // "game baseball"
Console.WriteLine(SplitWordWithArray("basketballgame", dictArray) ?? "null"); // null
Console.WriteLine(SplitWordWithArray("baseballgame", dictArray) ?? "null"); // This is a corner case. Will currently split into "baseball game", but since "base ballgame" could also be val

Solution

A couple things I would consider:

-
In the case of an algorithm like this, you usually want to return a list of all candidates instead of just one of them. (You return the last candidate found which should be all the compound words.)

I would consider (at the cost of significantly complicating the algorithm) returning a List from each, as this will allow the user to pick which they like the best. (Basic auto-correct guidelines.)

-
Next, in the case of your Trie work, it looks like you could replace the string key with a char key, which should have a performance advantage. (The string type is always a reference type, char is a value type, this can help coerce more performance out of the char type in a case like this.)

-
Without rigorous testing I can't guarantee the accuracy of this next bit, but a concept to test would be to make output a StringBuilder in each case, give it a length of word.Length, and then build strings like that instead of creating new strings constantly in your loops.

-
Lastly, you could consider a case when you are provided with an invalid word in one portion of it (like your basketballgame test) and if the first part is not valid, but the second part is valid then return your output as well.

Otherwise this is good code, and it definitely solves the problem as efficiently as I can think of.

Context

StackExchange Code Review Q#146250, answer score: 3

Revisions (0)

No revisions yet.