HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Splitting a sequence into equal segments

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
equalintosplittingsequencesegments

Problem

I need to split up a sequence into equal segments of a given size (yes, the last one may be shorter), and I'm trying to find an efficient and idiomatic way to do it.

I have two versions of the function; segment is the F# version of an awkward and probably quite inefficient old C# extension method that I wrote, while chop is an attempt to do it in a more intelligent manner, but is still rather imperative. I would have liked to have a solution that doesn't need to use reference cells, but could not come up with one.

module Seq =
    let segment segmentSize source =
        seq {
                let segmentCount = ref 0
                while !segmentCount * segmentSize  Seq.skip (!segmentCount * segmentSize) |> Seq.truncate segmentSize
                        segmentCount := !segmentCount + 1
                        yield segment
        }

    let chop segmentSize source =
        seq {
                let chopSource = ref source
                while not  Seq.truncate segmentSize
                        chopSource := !chopSource |> Seq.skip (Seq.length segment)
                        yield segment
        }


As I said, I think segment, while it gets the job done, is far from good F#; how efficient and idiomatic is chop? What could have been done better?

As for naming, is chop or segment the better name for a function like this, or would something else be more fitting? Also, the argument name segmentSize seems a bit off for F#; would just size be better, or something else?

Solution

Functional-first style tends to prefer immutability over mutability, so the fact that you're using ref can sometimes signify that the code could be more idiomatic.

Recursive Solution

But how do you keep track of the state of your program without mutation? The traditional functional solution is to use recursion so that each step of the algorithm is isolated in its own call context. Here is a fairly direct translation of your second example into a recursive function:

let rec chop segmentSize source = 
    seq { 
            if Seq.isEmpty source then () else
            let segment = source |> Seq.truncate segmentSize
            let rest = source |> Seq.skip (Seq.length segment)
            yield segment
            yield! chop segmentSize rest 
    }


Don't be intimidated by my if ... then () else pattern - it's a trick I adopted to avoid needing to indent the rest of function when a precondition is used (I've seen it elsewhere, and it is nice since there's no early return from functions in F#).

Note how I'm using yield! to flatten the sequence from the recursive call into the outer sequence. Also notice that there is no chopSource in this version, because what was chopSource in your example is simply source in the recursive call.

One possible problem with the above solution is that it is not (as far as I know) tail-recursive. I'm not 100% sure how seq works behind the scenes, but I'm assuming that it is not in this case. This may not matter for your problem, but it's still something to consider. It could always be rewritten to use an accumulator argument so that it is tail-recursive and won't overflow the stack.

Another option is to ditch the sequence expression altogether (note that this is still not tail-recursive):

let rec chop segmentSize source = 
    if Seq.isEmpty source then Seq.empty else
    let segment = source |> Seq.truncate segmentSize
    let rest = source |> Seq.skip (Seq.length segment)
    Seq.append [segment] (chop segmentSize rest)


Seq.unfold Solution

Although it's nice in this particular case, I'm not a huge fan of recursion because it can sometimes be hard to read and easy to get wrong. I normally use sequence functions as an alternative, as they handle those types of details behind the scenes, and I can worry about the problem at hand. Fortunately, the Seq module provides a nice function, unfold, which does exactly what we are trying to do here - it produces a sequence of values from an evolving state!

The function is called unfold because it is the inverse of fold, which is the equivalent of Enumerable.Aggregate from C#. Here is the documentation:

// Signature:
Seq.unfold : ('State -> 'T * 'State option) -> 'State -> seq

// Usage:
Seq.unfold generator state


The second argument, state, is the initial value that is going to be used to "seed" the operation (it may seem weird to have this as the second argument, but it is in that order to allow partial application / piping). That initial state value is going to be the first value passed in to our generator function. The generator function optionally returns a tuple containing the current value in the sequence along with the next state value that will be passed back in to generate the next value in the sequence. When the sequence is done, it returns None.

Enough commentary, here is the implementation:

let chop segmentSize source =
    source
    |> Seq.unfold(fun chopSource ->
        if Seq.isEmpty chopSource then None else
        let segment = chopSource |> Seq.truncate segmentSize
        let rest = chopSource |> Seq.skip (Seq.length segment)
        Some(segment, rest)
    )


So as you can see, we're basically allowing Seq.unfold to call our generator repetitively as an alternative to recursion. Each iteration returns the segment and the rest of the sequence, which becomes the next state / chopSoure. I'm not actually sure if that is clearer than the recursive version, but it at least won't ever overflow the stack.

As for your naming questions, I think it's somewhat arbitrary. I probably would have called the function segment[ed] (indeed, I think I have). The size argument could go either way. For what it's worth, here's a similar function defined in the Seq module, though it's dealing with a sliding window rather than fixed segments.

Addendum

All these solutions are assuming that it is fine to re-enumerate the sequence, i.e. by truncating once and then skipping. That might be an OK assumption for small, in-memory data-sets, but an IEnumerable doesn't have to allow multiple enumeration, and it can be expensive. As pointed out by Tomas Petricek on that other question, due to the forward-only nature of sequences, whenever you "back up" over a sequence like this it actually has to enumerate the entire thing from the beginning.

But perhaps a sequence isn't the best choice of data types for input into this function. If the data is alread

Code Snippets

let rec chop segmentSize source = 
    seq { 
            if Seq.isEmpty source then () else
            let segment = source |> Seq.truncate segmentSize
            let rest = source |> Seq.skip (Seq.length segment)
            yield segment
            yield! chop segmentSize rest 
    }
let rec chop segmentSize source = 
    if Seq.isEmpty source then Seq.empty else
    let segment = source |> Seq.truncate segmentSize
    let rest = source |> Seq.skip (Seq.length segment)
    Seq.append [segment] (chop segmentSize rest)
// Signature:
Seq.unfold : ('State -> 'T * 'State option) -> 'State -> seq<'T>

// Usage:
Seq.unfold generator state
let chop segmentSize source =
    source
    |> Seq.unfold(fun chopSource ->
        if Seq.isEmpty chopSource then None else
        let segment = chopSource |> Seq.truncate segmentSize
        let rest = chopSource |> Seq.skip (Seq.length segment)
        Some(segment, rest)
    )
let segmented (size:int) (source:'t[]) =
    let maxPos = source.Length - 1
    [| for pos in 0 .. size .. maxPos ->
           source.[pos .. min (pos + size - 1) maxPos] |]

Context

StackExchange Code Review Q#45668, answer score: 7

Revisions (0)

No revisions yet.