patternMinor
Splitting a sequence into equal segments
Viewed 0 times
equalintosplittingsequencesegments
Problem
I need to split up a sequence into equal segments of a given size (yes, the last one may be shorter), and I'm trying to find an efficient and idiomatic way to do it.
I have two versions of the function;
As I said, I think
As for naming, is
I have two versions of the function;
segment is the F# version of an awkward and probably quite inefficient old C# extension method that I wrote, while chop is an attempt to do it in a more intelligent manner, but is still rather imperative. I would have liked to have a solution that doesn't need to use reference cells, but could not come up with one.module Seq =
let segment segmentSize source =
seq {
let segmentCount = ref 0
while !segmentCount * segmentSize Seq.skip (!segmentCount * segmentSize) |> Seq.truncate segmentSize
segmentCount := !segmentCount + 1
yield segment
}
let chop segmentSize source =
seq {
let chopSource = ref source
while not Seq.truncate segmentSize
chopSource := !chopSource |> Seq.skip (Seq.length segment)
yield segment
}As I said, I think
segment, while it gets the job done, is far from good F#; how efficient and idiomatic is chop? What could have been done better? As for naming, is
chop or segment the better name for a function like this, or would something else be more fitting? Also, the argument name segmentSize seems a bit off for F#; would just size be better, or something else?Solution
Functional-first style tends to prefer immutability over mutability, so the fact that you're using
Recursive Solution
But how do you keep track of the state of your program without mutation? The traditional functional solution is to use recursion so that each step of the algorithm is isolated in its own call context. Here is a fairly direct translation of your second example into a recursive function:
Don't be intimidated by my
Note how I'm using
One possible problem with the above solution is that it is not (as far as I know) tail-recursive. I'm not 100% sure how
Another option is to ditch the sequence expression altogether (note that this is still not tail-recursive):
Seq.unfold Solution
Although it's nice in this particular case, I'm not a huge fan of recursion because it can sometimes be hard to read and easy to get wrong. I normally use sequence functions as an alternative, as they handle those types of details behind the scenes, and I can worry about the problem at hand. Fortunately, the
The function is called
The second argument,
Enough commentary, here is the implementation:
So as you can see, we're basically allowing
As for your naming questions, I think it's somewhat arbitrary. I probably would have called the function
Addendum
All these solutions are assuming that it is fine to re-enumerate the sequence, i.e. by truncating once and then skipping. That might be an OK assumption for small, in-memory data-sets, but an
But perhaps a sequence isn't the best choice of data types for input into this function. If the data is alread
ref can sometimes signify that the code could be more idiomatic.Recursive Solution
But how do you keep track of the state of your program without mutation? The traditional functional solution is to use recursion so that each step of the algorithm is isolated in its own call context. Here is a fairly direct translation of your second example into a recursive function:
let rec chop segmentSize source =
seq {
if Seq.isEmpty source then () else
let segment = source |> Seq.truncate segmentSize
let rest = source |> Seq.skip (Seq.length segment)
yield segment
yield! chop segmentSize rest
}Don't be intimidated by my
if ... then () else pattern - it's a trick I adopted to avoid needing to indent the rest of function when a precondition is used (I've seen it elsewhere, and it is nice since there's no early return from functions in F#).Note how I'm using
yield! to flatten the sequence from the recursive call into the outer sequence. Also notice that there is no chopSource in this version, because what was chopSource in your example is simply source in the recursive call.One possible problem with the above solution is that it is not (as far as I know) tail-recursive. I'm not 100% sure how
seq works behind the scenes, but I'm assuming that it is not in this case. This may not matter for your problem, but it's still something to consider. It could always be rewritten to use an accumulator argument so that it is tail-recursive and won't overflow the stack.Another option is to ditch the sequence expression altogether (note that this is still not tail-recursive):
let rec chop segmentSize source =
if Seq.isEmpty source then Seq.empty else
let segment = source |> Seq.truncate segmentSize
let rest = source |> Seq.skip (Seq.length segment)
Seq.append [segment] (chop segmentSize rest)Seq.unfold Solution
Although it's nice in this particular case, I'm not a huge fan of recursion because it can sometimes be hard to read and easy to get wrong. I normally use sequence functions as an alternative, as they handle those types of details behind the scenes, and I can worry about the problem at hand. Fortunately, the
Seq module provides a nice function, unfold, which does exactly what we are trying to do here - it produces a sequence of values from an evolving state!The function is called
unfold because it is the inverse of fold, which is the equivalent of Enumerable.Aggregate from C#. Here is the documentation:// Signature:
Seq.unfold : ('State -> 'T * 'State option) -> 'State -> seq
// Usage:
Seq.unfold generator stateThe second argument,
state, is the initial value that is going to be used to "seed" the operation (it may seem weird to have this as the second argument, but it is in that order to allow partial application / piping). That initial state value is going to be the first value passed in to our generator function. The generator function optionally returns a tuple containing the current value in the sequence along with the next state value that will be passed back in to generate the next value in the sequence. When the sequence is done, it returns None.Enough commentary, here is the implementation:
let chop segmentSize source =
source
|> Seq.unfold(fun chopSource ->
if Seq.isEmpty chopSource then None else
let segment = chopSource |> Seq.truncate segmentSize
let rest = chopSource |> Seq.skip (Seq.length segment)
Some(segment, rest)
)So as you can see, we're basically allowing
Seq.unfold to call our generator repetitively as an alternative to recursion. Each iteration returns the segment and the rest of the sequence, which becomes the next state / chopSoure. I'm not actually sure if that is clearer than the recursive version, but it at least won't ever overflow the stack.As for your naming questions, I think it's somewhat arbitrary. I probably would have called the function
segment[ed] (indeed, I think I have). The size argument could go either way. For what it's worth, here's a similar function defined in the Seq module, though it's dealing with a sliding window rather than fixed segments.Addendum
All these solutions are assuming that it is fine to re-enumerate the sequence, i.e. by truncating once and then skipping. That might be an OK assumption for small, in-memory data-sets, but an
IEnumerable doesn't have to allow multiple enumeration, and it can be expensive. As pointed out by Tomas Petricek on that other question, due to the forward-only nature of sequences, whenever you "back up" over a sequence like this it actually has to enumerate the entire thing from the beginning.But perhaps a sequence isn't the best choice of data types for input into this function. If the data is alread
Code Snippets
let rec chop segmentSize source =
seq {
if Seq.isEmpty source then () else
let segment = source |> Seq.truncate segmentSize
let rest = source |> Seq.skip (Seq.length segment)
yield segment
yield! chop segmentSize rest
}let rec chop segmentSize source =
if Seq.isEmpty source then Seq.empty else
let segment = source |> Seq.truncate segmentSize
let rest = source |> Seq.skip (Seq.length segment)
Seq.append [segment] (chop segmentSize rest)// Signature:
Seq.unfold : ('State -> 'T * 'State option) -> 'State -> seq<'T>
// Usage:
Seq.unfold generator statelet chop segmentSize source =
source
|> Seq.unfold(fun chopSource ->
if Seq.isEmpty chopSource then None else
let segment = chopSource |> Seq.truncate segmentSize
let rest = chopSource |> Seq.skip (Seq.length segment)
Some(segment, rest)
)let segmented (size:int) (source:'t[]) =
let maxPos = source.Length - 1
[| for pos in 0 .. size .. maxPos ->
source.[pos .. min (pos + size - 1) maxPos] |]Context
StackExchange Code Review Q#45668, answer score: 7
Revisions (0)
No revisions yet.