HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

String de-duplication using a custom string pool

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
duplicationcustomusingpoolstring

Problem

First let me address why I'm not using the built-in string interning. My code is used in a utility that needs to free all of its memory once it's closed. Strings that are interned are not likely to be freed until the CLR is restarted. I can't create that situation since my utility is used on servers that primarily run .NET based services. (See this reference on CLR not releasing interned strings)

On to the question:

I have created a custom string pool in C#, implemented as an extension method in a static class that contains the actual "pool" in the form of a dictionary:

internal static Dictionary StringPool = new Dictionary();

internal static string Pool(this string value)
{
    if (!string.IsNullOrEmpty(value) && value.Length <= 200)
    {
        string poolRef;

        if (StringPool.TryGetValue(value, out poolRef))
            value = poolRef;  
    else
        StringPool.Add(value, value);
}

return value;
}


To use it, I simply append .Pool() to any string in my project that I want to be pooled. The assumption I am making is that this approach is storing two copies of strings. My assumption comes from the fact that I can recover the actual strings by iterating the Keys in the StringPool object and printing or viewing their contents.

The question I have is whether anyone knows of another indexed object type that I could use as a StringPool object such that the string data would only be stored once, as the Key, rather than twice, as both Key and Value. Additionally I need the fast Key lookup feature that a Dictionary has.

Here's the pseudo code that I believe conveys what I'm after in practice:

```
internal static CoolClass StringPool = new CoolClass();

internal static string Pool(this string s)
{
if (!string.IsNullOrEmpty(s) && s.Length <= 200)
{
string keyRef = null;

// obtain reference to key in CoolClass if it already exists
if (StringPool.TryGetKey(s, out keyRef))
s = keyRef;

Solution

So, you want a Collection which should hold your strings. The easiest and fastest would be to simply use an array. So let us do it.

Because you are using this for pooling strings, we we only need a string[] array. We should add a method bool TryGetValue(string key, out string value) and a string Add(string value).

If we use an array we need to initialize this array to a given capacity at the constructor level and add a possibility to let the array grow if needed.

I decided to use the name you want CoolClass but it should be better named to something more meaningful.

We need to check if a given "key" is in the array at two places, the TryGetValue() and the Add() method. So let us introduce another method int IndexOf(string key) which returns the index of the given keyand if it isn't in the array it returns -1, which should be extracted to a const.

This leads to

public class CoolClass
{
    private string[] array;
    private int capacity;
    private int lastElement = -1;
    private const int elementNotFound = -1;

    public CoolClass() : this(1024) { }
    public CoolClass(int capacity)
    {
        this.capacity = capacity;
        array = new string[capacity];
    }

    public bool TryGetValue(string key, out string value)
    {
        int i;
        if ((i = IndexOf(key)) != elementNotFound)
        {
            value = array[i];
            return true;
        }

        value = null;
        return false;
    }

    public void Add(String value)
    {
        if (value != null)
        {
            int i;
            if ((i = IndexOf(value)) != elementNotFound)
            {
                lastElement++;
                if (lastElement >= capacity)
                {
                    Grow();
                }
                array[lastElement] = value;
            }
        }
    }

    private int IndexOf(string key)
    {
        for (int i = 0; i < lastElement; i++)
        {
            if (array[i] == key)
            {
                return i;
            }
        }
        return elementNotFound;
    }

    private void Grow()
    {
        capacity = capacity * 2;
        Array.Resize(ref array, capacity);

    }
}


But maybe we should think about this. Why shouldn't we rename the TryGetValue() to GetOrAdd() and remove the Add() method ? Because we don't want to handle cases where the string isn't pooled in the client code.

public class CoolClass
{
    private string[] array;
    private int capacity;
    private int lastElement = -1;
    private const int elementNotFound = -1;

    public CoolClass() : this(1024) { }
    public CoolClass(int capacity)
    {
        this.capacity = capacity;
        array = new string[capacity];
    }

    public string GetOrAdd(string key)
    {
        if (key == null) { return null; }

        int i;
        if ((i = IndexOf(key)) != elementNotFound)
        {
            return array[i];
        }

        lastElement++;
        if (lastElement >= capacity)
        {
            Grow();
        }

        array[lastElement] = key;
        return array[lastElement];

    }

    private int IndexOf(string key)
    {
        for (int i = 0; i < lastElement; i++)
        {
            if (array[i] == key)
            {
                return i;
            }
        }
        return elementNotFound;
    }

    private void Grow()
    {
        capacity = capacity * 2;
        Array.Resize(ref array, capacity);

    }
}


You reallay should use braces {} for singleline if statements too, this won't harm but makes the code less error prone.

Also it isn't mentioned explicitly in the naming guidelines how to name internal static variables, you should use camelCase casing for naming them.

By using a guard clause like

if (string.IsNullOrEmpty(value) || value.Length > 200) { return value; }


you can return early and therefor you save horizontal spacing which improves the readability of the code.

Code Snippets

public class CoolClass
{
    private string[] array;
    private int capacity;
    private int lastElement = -1;
    private const int elementNotFound = -1;

    public CoolClass() : this(1024) { }
    public CoolClass(int capacity)
    {
        this.capacity = capacity;
        array = new string[capacity];
    }

    public bool TryGetValue(string key, out string value)
    {
        int i;
        if ((i = IndexOf(key)) != elementNotFound)
        {
            value = array[i];
            return true;
        }

        value = null;
        return false;
    }

    public void Add(String value)
    {
        if (value != null)
        {
            int i;
            if ((i = IndexOf(value)) != elementNotFound)
            {
                lastElement++;
                if (lastElement >= capacity)
                {
                    Grow();
                }
                array[lastElement] = value;
            }
        }
    }

    private int IndexOf(string key)
    {
        for (int i = 0; i < lastElement; i++)
        {
            if (array[i] == key)
            {
                return i;
            }
        }
        return elementNotFound;
    }

    private void Grow()
    {
        capacity = capacity * 2;
        Array.Resize(ref array, capacity);

    }
}
public class CoolClass
{
    private string[] array;
    private int capacity;
    private int lastElement = -1;
    private const int elementNotFound = -1;

    public CoolClass() : this(1024) { }
    public CoolClass(int capacity)
    {
        this.capacity = capacity;
        array = new string[capacity];
    }

    public string GetOrAdd(string key)
    {
        if (key == null) { return null; }

        int i;
        if ((i = IndexOf(key)) != elementNotFound)
        {
            return array[i];
        }

        lastElement++;
        if (lastElement >= capacity)
        {
            Grow();
        }

        array[lastElement] = key;
        return array[lastElement];

    }

    private int IndexOf(string key)
    {
        for (int i = 0; i < lastElement; i++)
        {
            if (array[i] == key)
            {
                return i;
            }
        }
        return elementNotFound;
    }

    private void Grow()
    {
        capacity = capacity * 2;
        Array.Resize(ref array, capacity);

    }
}
if (string.IsNullOrEmpty(value) || value.Length > 200) { return value; }

Context

StackExchange Code Review Q#83744, answer score: 2

Revisions (0)

No revisions yet.