HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpModerate

Memory-efficient string collection

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
collectionefficientstringmemory

Problem

My goal is to create a memory-efficient (immutable) collection of strings. The imagined use-case is checking for valid words in a Scrabble-like game.

Wikimedia Commons has a pretty good summary of the data structure (author: Chkno):

(Image license: Creative Commons Attribution-Share Alike 3.0 Unported.)

For an input of { "tap", "taps", "top", "tops" }, StringCollection should construct the figure on the right. The only difference is that instead of an EOW link, states are marked as final.

The algorithm for constructing the data structure is from the paper Incremental construction of minimal acyclic finite-state automata.

```
namespace Mjolka.Collections
{
using System;
using System.Collections.Generic;
using System.Text;

///
/// Represents a memory-efficient collection of strings.
///
[Serializable]
public class StringCollection : IEnumerable
{
///
/// The initial state of the finite-state automaton.
///
private readonly State initialState;

///
/// The number of elements in the collection.
///
private readonly int count;

///
/// Initializes a new instance of the class that contains
/// the specified strings, where the strings are in lexicographic order.
///
///
/// The collection of strings that are in the new , in
/// lexicographic order.
///
///
/// is null.
///
///
/// An item in is null.
///
public StringCollection(IEnumerable strings)
{
if (strings == null)
{
throw new ArgumentNullException("strings");
}

var register = new Dictionary(new StateComparer());
using (var enumerator = strings.GetEnumerator())
{
if (!enumerator.MoveNext())
{
return;
}

Solution

StringCollection

Naming

StringCollection is, as I pointed out in the comments, a misleading name that suggests your type is implementing ICollection. A better name is pretty hard to find though, since EnumerableString just looks weird, as String already implements IEnumerable.

The only "better" name I can think of, is Strings... it's somewhat lame, granted.. but it's not misleading.

The Contains member method, because of how the C# compiler resolves method calls, will always be called over Linq's Contains(IEnumerable, TSource) extension method; this may or may not be desired, and/or expected. Since you already have XML comments, you could add a ` tag, and document that.

Duplication

You
return new Enumerator(this) a lot. Why not have a private IEnumerator GetEnumeratorInternal() method that all GetEnumerator() methods could call?

Constructor

I wouldn't bother with the early return if the
MoveNext returns false, since the while loop won't iterate in that case, and you'll no longer need to check whether Current is null outside the loop, and you can start incrementing the count within the loop instead of starting at 1.

Enumerator

this

This is purely aesthetic, but I find the constructor... bulky:

public Enumerator(StringCollection stringCollection)
{
    this.initialState = stringCollection.initialState;
    this.stringBuilder = new StringBuilder();
    this.states = new Stack();
    this.states.Push(this.initialState);
    this.edges = new Stack();
}


...compared to how I would have written it:

public Enumerator(StringCollection stringCollection)
{
    _initialState = stringCollection.initialState;

    _stringBuilder = new StringBuilder();
    _edges = new Stack();
    _states = new Stack();

    _states.Push(this.initialState);
}


By prefixing private fields with an underscore, you remove the need to qualify them in assignations like this.

Comments

These comments are redundant and useless. Remove them. Don't write XML comments just for the sake of writing XML comments; private members don't/shouldn't need XML comments:

/// 
/// The initial state.
/// 
private readonly State initialState;

/// 
/// The string builder.
/// 
private readonly StringBuilder stringBuilder;


Compare to:

private readonly State _initialState;
private readonly StringBuilder _stringBuilder;


XML comments that add no value, are just noise and clutter. Burn.

You're declaring a very short-lived variable in
MoveNext, that can afford to disappear:

var label = edge.Label;
this.stringBuilder.Append(label);


this.stringBuilder.Append(edge.Label);` is clear enough IMO.

Code Snippets

public Enumerator(StringCollection stringCollection)
{
    this.initialState = stringCollection.initialState;
    this.stringBuilder = new StringBuilder();
    this.states = new Stack<State>();
    this.states.Push(this.initialState);
    this.edges = new Stack<PathSegment>();
}
public Enumerator(StringCollection stringCollection)
{
    _initialState = stringCollection.initialState;

    _stringBuilder = new StringBuilder();
    _edges = new Stack<PathSegment>();
    _states = new Stack<State>();

    _states.Push(this.initialState);
}
/// <summary>
/// The initial state.
/// </summary>
private readonly State initialState;

/// <summary>
/// The string builder.
/// </summary>
private readonly StringBuilder stringBuilder;
private readonly State _initialState;
private readonly StringBuilder _stringBuilder;
var label = edge.Label;
this.stringBuilder.Append(label);

Context

StackExchange Code Review Q#56514, answer score: 13

Revisions (0)

No revisions yet.