HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonMinor

Quasi-random sequences: how to improve style and tests?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
quasirandomtestsimprovestylehowandsequences

Problem

The requirements for this one were (original SO question):

  • Generate a random-ish sequence of items.



  • Sequence should have each item N times.



  • Sequence shouldn't have serial runs longer than a given number (longest below).



The solution was actually drafted by another user, this is my implementation (influenced by random.shuffle).

from random import random
from itertools import groupby # For testing the result
try: xrange
except: xrange = range

def generate_quasirandom(values, n, longest=3, debug=False):
  # Sanity check
  if len(values) = longest:
        serial = 0
        guard = 0
        # We got a serial run, break it
        while source[j] == latest:
          j = int(random() * (sourcelen - i)) + i
          guard += 1
          # We just hit an infinit loop: there is no way to avoid a serial run
          if guard > 10:
            print("Unable to avoid serial run, disabling asserts.")
            debug = False
            break
    else:
      serial = 0
    latest = source[j]
    # Move the picked value to source[i:]
    source[i], source[j] = source[j], source[i]

  # More sanity checks
  check_quasirandom(source, values, n, longest, debug)

  return source

def check_quasirandom(shuffled, values, n, longest, debug):
  counts = []
  # We skip the last entries because breaking runs in them get too hairy
  for val, count in groupby(shuffled):
    counts.append(len(list(count)))
  highest = max(counts)
  print('Longest run: %d\nMax run lenght:%d' % (highest, longest))

  # Invariants
  assert len(shuffled) == len(values) * n
  for val in values:
    assert shuffled.count(val) == n

  if debug:
    # Only checked if we were able to avoid a sequential run >= longest
    assert highest <= longest

for x in xrange(10, 1000):
  generate_quasirandom((0, 1, 2, 3), 1000, x//10, debug=True)


I'd like to receive any input you have on improving this code, from style to comments to tests and anything else you can think of.

Solution

A couple of possible code improvements that I noticed:

sourcelen = len(values) * n


This seems unnecessarily complicated to me. I mean, after a second of thinking the reader of this line will realize that len(values) * n is indeed the length of source, but that's still one step of thinking more than would be required if you just did sourcelen = len(source) (after populating source of course).

That being said, I don't see why you need to store the length of source in a variable at all. Doing for i in xrange(len(source)): isn't really less readable or less efficient than doing for i in xrange(sourcelen):, so I'd just get rid of the variable altogether.

source = []
for val in values:
  source += [val] * n


This can be written as a list comprehension like this:

source = [x for val in values for x in [val]*n]


Using list comprehensions is usually considered more idiomatic python than building up a list iteratively. It is also often more efficient.

As Fred Nurk points out, the list comprehension can also be written as

source = [val for val in values for _ in xrange(n)]


which avoids the creation of a temporary list and is maybe a bit more readable.

j = int(random() * (sourcelen - i)) + i


To get a random integer between x (inclusive) and y (exclusive), you can use random.randrange(x,y), so the above can be written as:

j = randrange(i, len(source) - i)


(You'll also need to import randrange instead of random from the random module). This makes it more immediately obviously that j is a random number between i and len(source) - i and introduces less room for mistakes.

Code Snippets

sourcelen = len(values) * n
source = []
for val in values:
  source += [val] * n
source = [x for val in values for x in [val]*n]
source = [val for val in values for _ in xrange(n)]
j = int(random() * (sourcelen - i)) + i

Context

StackExchange Code Review Q#219, answer score: 4

Revisions (0)

No revisions yet.