HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpModerate

Multiple indexes over an in memory collection for faster search

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
searchcollectionindexesfasterformemorymultipleover

Problem

I have a big in-memory collection of following simplified class:

public class Product 
{
    public int Id { get; set; }
    public string UserName { get; set; }
    public int CategoryId { get; set; }
    public string Title { get; set; }
    public string Description { get; set; }
}


I need to search for products based on different properties like UserName or CategoryId.
One way of searching would be using linq to objects like:

var userProducts = products.Where(x => x.UserName == "SomeValue")


This takes some processing when collection is too big and in my case it would be called hundreds of time each second.

What I came up with was to introduce a new collection that supports indexing over different properties:

```
public class FastCollection : IEnumerable
{
private IList _items;
private IList>> _lookups;
private Dictionary> _indexes;

public FastCollection(IList data)
{
_items = data;
_lookups = new List>>();
_indexes = new Dictionary>();
}

public void AddIndex(Expression> property)
{
_lookups.Add(property);
_indexes.Add(property.ToString(), _items.ToLookup(property.Compile()));
}

public void Add(T item)
{
_items.Add(item);
RebuildIndexes();
}

public void Remove(T item)
{
_items.Remove(item);
RebuildIndexes();
}

public void RebuildIndexes()
{
if (_lookups.Count > 0)
{
_indexes = new Dictionary>();
foreach (var lookup in _lookups)
{
_indexes.Add(lookup.ToString(), _items.ToLookup(lookup.Compile()));
}
}
}

public IEnumerable FindValue(Expression> property, TProperty value)
{
var key = property.ToString();
if(_indexes.ContainsKey(key))
{
return _indexes[key][value];
}
else
{
var c = property.Compile();
return _items.Where(x

Solution

Using ToString() to compare expressions for equality might work in simple cases, but:

  • It requires you to always use the same parameter name, for example, it would consider x => x.Id and product => product.Id to be different expressions.



  • Expressions with different meaning can produce the same string, for example (int i) => (float)i and (int i) => (double)i both produce i => Convert(i). Because of this, it might make sense to ensure that the used expressions contain only property accesses and nothing else.



Instead you should compare Expressions properly.

It seems wasteful to me to rebuild all indexes after each change. If you change the collection often, consider changing only the relevant part of each index.

Fields that are set in the constructor and then never modified should be readonly.

IList data


If you're on .Net 4.5, you could use IReadOnlyList here.

if (_lookups.Count > 0)


This check is pretty much useless. It saves you from unnecessarily creating an empty dictionary, but doing that is very cheap, so I think shorter code should take the priority here.

You could replace the whole RebuildIndexes() method with a single ToDictionary():

_indexes = _lookups.ToDictionary(
    lookup => lookup.ToString(), lookup => _items.ToLookup(lookup.Compile()));


c(x).Equals(value)


This won't work correctly when c(x) returns null. You should probably use object.Equals(c(x), value) instead.

Code Snippets

IList<T> data
if (_lookups.Count > 0)
_indexes = _lookups.ToDictionary(
    lookup => lookup.ToString(), lookup => _items.ToLookup(lookup.Compile()));
c(x).Equals(value)

Context

StackExchange Code Review Q#40811, answer score: 18

Revisions (0)

No revisions yet.