HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Pythonic split list into n random chunks of roughly equal size

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
randompythonicequalintochunkssizesplitlistroughly

Problem

As part of my implementation of cross-validation, I find myself needing to split a list into chunks of roughly equal size.

import random

def chunk(xs, n):
    ys = list(xs)
    random.shuffle(ys)
    ylen = len(ys)
    size = int(ylen / n)
    chunks = [ys[0+size*i : size*(i+1)] for i in xrange(n)]
    leftover = ylen - size*n
    edge = size*n
    for i in xrange(leftover):
            chunks[i%n].append(ys[edge+i])
    return chunks


This works as intended

>>> chunk(range(10), 3)
[[4, 1, 2, 7], [5, 3, 6], [9, 8, 0]]


But it seems rather long and boring. Is there a library function that could perform this operation? Are there pythonic improvements that can be made to my code?

Solution

import random

def chunk(xs, n):
    ys = list(xs)


Copies of lists are usually taken using xs[:]

random.shuffle(ys)
    ylen = len(ys)


I don't think storing the length in a variable actually helps your code much

size = int(ylen / n)


Use size = ylen // n // is the integer division operator

chunks = [ys[0+size*i : size*(i+1)] for i in xrange(n)]


Why the 0+?

leftover = ylen - size*n


Actually, you can find size and leftover using size, leftover = divmod(ylen, n)

edge = size*n
    for i in xrange(leftover):
            chunks[i%n].append(ys[edge+i])


You can't have len(leftovers) >= n. So you can do:

for chunk, value in zip(chunks, leftover):
       chunk.append(value)

    return chunks


Some more improvement could be had if you used numpy. If this is part of a number crunching code you should look into it.

Code Snippets

import random

def chunk(xs, n):
    ys = list(xs)
random.shuffle(ys)
    ylen = len(ys)
size = int(ylen / n)
chunks = [ys[0+size*i : size*(i+1)] for i in xrange(n)]
leftover = ylen - size*n

Context

StackExchange Code Review Q#4872, answer score: 5

Revisions (0)

No revisions yet.