patterncsharpMinor
Extension method to retrieve a sample from a collection
Viewed 0 times
samplemethodcollectionextensionretrievefrom
Problem
I wrote a little extension method to retrieve a random sample of items from a collection. I've tried to write clean code. What do you think?
I'll be glad to hear your comments.
- is it clean?
- does it work as you would expect?
I'll be glad to hear your comments.
using System;
using System.Collections.Generic;
using System.Linq;
namespace Extensions
{
public static class IEnumerableExtensions
{
private static Random _rnd = new Random();
public static T Sample(this IEnumerable values)
{
return Sample(values, 1).First();
}
public static IEnumerable Sample(this IEnumerable values, int size)
{
if(!values.Any())
{
throw new InvalidOperationException("Collection empty.");
}
if (size ();
}
if (size >= values.Count())
{
return values;
}
return GetSamples(values, size);
}
private static IEnumerable GetSamples(IEnumerable values, int size)
{
var samples = new List();
var source = new List(values);
while (samples.Count != size)
{
var index = _rnd.Next(source.Count());
var value = source[index];
samples.Add(value);
source.Remove(value);
}
return samples;
}
}
}Solution
public static T Sample(this IEnumerable values)
{
return Sample(values, 1).First();
}Nice helper method.
if(!values.Any())
{
throw new InvalidOperationException("Collection empty.");
}I would have used an
ArgumentOutOfRangeException, but that's a matter of personal preference. Should it throw an exception if size == 0 or return an empty list? That's a tricky edge case which should be clearly documented.public static IEnumerable Sample(this IEnumerable values, int size)I would prefer the
Random to be an argument too, even if it's as Random rnd = null with a fallback inside the method. Being able to seed your random selection is important for testing the code and for reproducing results if it's a scientific project.if (size ();
}I would definitely prefer to throw an
ArgumentOutOfRangeException if size values.Count() then I think that should be ArgumentOutOfRangeException or InvalidOperationException. Again, whichever way you finally decide, it should be clearly documented.Note that we've now called two methods which iterate partially or fully through
values. If it was a non-deterministic iterator we have problems. An example of a non-deterministic iterator would bepublic IEnumerable Demo()
{
var rnd = new Random();
for (int i = rnd.Next(10); i > 0; i--) yield return rnd.Next();
}If you call this once and then loop through the resulting iterator multiple times then it will not consistently return the same values. You may think that in real usage you won't come across iterators like that, but sometimes the non-determinism is a lot more subtle.
var samples = new List();
var source = new List(values);Since we have to convert
source to a list here, there's an argument for doing it earlier and ensuring that we only iterate through values once, solving the problem of non-deterministic iterators.source.Remove(value);This is expensive. It's asymptotically more efficient to replace this with
source[index] = source[source.Count - 1];
source.RemoveAt(source.Count - 1);Finally, on the names. In your context it might be obvious that
Sample means "sample without replacement", but in some contexts it might be important to distinguish between sampling with and without replacement. Consider whether you might need to distinguish the cases in future usage of the library.Postscript: t3chb0t's suggestion of a
Randomise method which can then be combined with Take is compatible with a rewrite of your GetSamples method, and in fact this is what I have kicking around in my library of utility methods (although I call it Shuffle). As a modification of your code:public static IEnumerable Randomise(IEnumerable values, Random rnd = null)
{
if (rnd == null) rnd = _rnd;
var source = new List(values);
while (source.Count > 0)
{
var index = rnd.Next(source.Count);
yield return source[index];
source[index] = source[source.Count - 1];
source.RemoveAt(source.Count - 1);
}
}Code Snippets
public static T Sample<T>(this IEnumerable<T> values)
{
return Sample(values, 1).First();
}if(!values.Any())
{
throw new InvalidOperationException("Collection empty.");
}public static IEnumerable<T> Sample<T>(this IEnumerable<T> values, int size)if (size < 1)
{
return new List<T>();
}if (size >= values.Count())
{
return values;
}Context
StackExchange Code Review Q#162975, answer score: 2
Revisions (0)
No revisions yet.