HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Extension method to retrieve a sample from a collection

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
samplemethodcollectionextensionretrievefrom

Problem

I wrote a little extension method to retrieve a random sample of items from a collection. I've tried to write clean code. What do you think?

  • is it clean?



  • does it work as you would expect?



I'll be glad to hear your comments.

using System;
using System.Collections.Generic;
using System.Linq;

namespace Extensions
{
    public static class IEnumerableExtensions
    {
        private static Random _rnd = new Random();

        public static T Sample(this IEnumerable values)
        {
            return Sample(values, 1).First();
        }

        public static IEnumerable Sample(this IEnumerable values, int size)
        {
            if(!values.Any())
            {
                throw new InvalidOperationException("Collection empty.");
            }

            if (size ();
            }

            if (size >= values.Count())
            {
                return values;
            }

            return GetSamples(values, size);
        }
        private static IEnumerable GetSamples(IEnumerable values, int size)
        {
            var samples = new List();
            var source = new List(values);
            while (samples.Count != size)
            {
                var index = _rnd.Next(source.Count());
                var value = source[index];
                samples.Add(value);
                source.Remove(value);
            }
            return samples;
        }
    }
}

Solution

public static T Sample(this IEnumerable values)
        {
            return Sample(values, 1).First();
        }


Nice helper method.

if(!values.Any())
            {
                throw new InvalidOperationException("Collection empty.");
            }


I would have used an ArgumentOutOfRangeException, but that's a matter of personal preference. Should it throw an exception if size == 0 or return an empty list? That's a tricky edge case which should be clearly documented.

public static IEnumerable Sample(this IEnumerable values, int size)


I would prefer the Random to be an argument too, even if it's as Random rnd = null with a fallback inside the method. Being able to seed your random selection is important for testing the code and for reproducing results if it's a scientific project.

if (size ();
            }


I would definitely prefer to throw an ArgumentOutOfRangeException if size values.Count() then I think that should be ArgumentOutOfRangeException or InvalidOperationException. Again, whichever way you finally decide, it should be clearly documented.

Note that we've now called two methods which iterate partially or fully through values. If it was a non-deterministic iterator we have problems. An example of a non-deterministic iterator would be

public IEnumerable Demo()
{
    var rnd = new Random();
    for (int i = rnd.Next(10); i > 0; i--) yield return rnd.Next();
}


If you call this once and then loop through the resulting iterator multiple times then it will not consistently return the same values. You may think that in real usage you won't come across iterators like that, but sometimes the non-determinism is a lot more subtle.

var samples = new List();
            var source = new List(values);


Since we have to convert source to a list here, there's an argument for doing it earlier and ensuring that we only iterate through values once, solving the problem of non-deterministic iterators.

source.Remove(value);


This is expensive. It's asymptotically more efficient to replace this with

source[index] = source[source.Count - 1];
                source.RemoveAt(source.Count - 1);


Finally, on the names. In your context it might be obvious that Sample means "sample without replacement", but in some contexts it might be important to distinguish between sampling with and without replacement. Consider whether you might need to distinguish the cases in future usage of the library.

Postscript: t3chb0t's suggestion of a Randomise method which can then be combined with Take is compatible with a rewrite of your GetSamples method, and in fact this is what I have kicking around in my library of utility methods (although I call it Shuffle). As a modification of your code:

public static IEnumerable Randomise(IEnumerable values, Random rnd = null)
    {
        if (rnd == null) rnd = _rnd;

        var source = new List(values);
        while (source.Count > 0)
        {
            var index = rnd.Next(source.Count);
            yield return source[index];
            source[index] = source[source.Count - 1];
            source.RemoveAt(source.Count - 1);
        }
    }

Code Snippets

public static T Sample<T>(this IEnumerable<T> values)
        {
            return Sample(values, 1).First();
        }
if(!values.Any())
            {
                throw new InvalidOperationException("Collection empty.");
            }
public static IEnumerable<T> Sample<T>(this IEnumerable<T> values, int size)
if (size < 1)
            {
                return new List<T>();
            }
if (size >= values.Count())
            {
                return values;
            }

Context

StackExchange Code Review Q#162975, answer score: 2

Revisions (0)

No revisions yet.