HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Extracting sleep quality scores from periodic readings

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
qualitysleepperiodicreadingsextractingscoresfrom

Problem

How can I optimize the next function to work faster? The function must prepare a list to put in a CSV file. The list must contain values of average percentage of sleep per minute counted from scores - a list of the form [[stage, time]], where stage could be 'S', 'R' or 'W' and time is in '%d %H:%M:%S' format; it's something like this:

scores = [['S', '01 12:11:00'], ['S', '01 12:11:20'], ['W', '01 12:11:40'], ['S', '01 12:12:00'], ...]


The function also must group counted values by hour and day, so the output should be like this:

quantities = [[1, 12, 11, 0.66], [1, 12, 12, 0.33], [1, 12, 13, 1], ...]


from itertools import product
import datetime

def sleep_quantity(scores):
    global quantities
    quantities = []
    for day, hour, minute in product(range(1,4), range(24), range(60)):
        sleep = 0
        total = 0
        quantity = []
        for line in scores:
            time = datetime.datetime.strptime(line[1], '%d %H:%M:%S')
            stage = line[0]
            if time.hour == hour and time.day == day and time.minute == minute:
                total += 1
                if stage == 'S' or stage == 'R':
                    sleep += 1
        if total != 0:
            sleep_per_minute = sleep / total
            quantity.append(hour)
            quantity.append(day)
            quantity.append(minute)
            quantity.append(sleep_per_minute)
            quantities.append(quantity) #for export to CSV
    return quantities

Solution

First of all the quantities global sounds unnecessary. The value is
cleared at the start and the value is returned from the function too,
no need to clobber a global.

The way quantity is updated can be written much more concisely by
immediately appending a literal list instead of updating a variable:

quantities.append([hour, day, minute, sleep / total])


Now just another quick suggestion is to use destructuring in the inner
for loop too:

for (stage, time_string) in scores:


Finally, the ordering of hour, day, minute in the result means that the
normal sort method on a list isn't sufficient to sort the list in a
sensible manner - I'd suggest to put it in decreasing order, i.e. day,
hour, minute.

Edit: Was updated before I ended up posting, but my assumptions still hold I think. (Unfortunately there's not much of an explanation of why this function does what it does.)

However it's immediately a concern that the way the iterations are set
up is less efficient than the other way round: the product list is
4320 elements long, while the scores list is likely much shorter -
that sounds like they should be nested the other way round.

Given that there's no explanation of what the code is supposed to do I'm
going to guess a bit on the semantics. Now we're iterating over three
days in one minute steps, checking whether any of the elements in
scores match and do something if so. For total, we accumulate all
the matches for that moment in time and if we got one, we append to the
quantities list.

That means, I think, that you're looking for all timestamps in the list,
which occur at the same moment, then add up the "type" of the timestamp
and append one result for each moment where there were any matches.

That can be written much better in terms of sorting (the timestamps) and
accumulating consecutive ranges instead:

def sleep_quantity2(scores):
    buckets = {}

    for stage, time_string in scores:
        time = datetime.datetime.strptime(time_string, '%d %H:%M:%S')

        stages = buckets.get(time, [0, 0])
        buckets[time] = stages

        stages[0] += 1
        if stage == "R" or stage == "S":
            stages[1] += 1

    quantities = []
    for time, stages in buckets.iteritems():
        quantities.append([time.day, time.hour, time.minute, stages[1] / stages[0]])

    quantities.sort()

    return quantities


The process is thus: accumulate entries into buckets, one bucket for
each moment and at the same time updating the numbers for that bucket,
then at the end, iterating over all buckets and producing the result
list and, since the order is arbitrary, sort them again so it matches
the output of the previous approach.

Now if you want, you can still use yield instead of accumulating the
list in place and use xrange instead of range in Python 2.7.

Oh and run a profiler.

For reference, I ended up comparing it with the following code:

def sleep_quantity1(scores):
    quantities = []
    for day, hour, minute in product(range(1, 4), range(24), range(60)):
        sleep = 0
        total = 0
        for (stage, time_string) in scores:
            time = datetime.datetime.strptime(time_string, '%d %H:%M:%S')
            if time.hour == hour and time.day == day and time.minute == minute:
                total += 1
                if stage == 'S' or stage == 'R':
                    sleep += 1
        if total != 0:
            quantities.append([day, hour, minute, sleep / total])
    return quantities

Code Snippets

quantities.append([hour, day, minute, sleep / total])
for (stage, time_string) in scores:
def sleep_quantity2(scores):
    buckets = {}

    for stage, time_string in scores:
        time = datetime.datetime.strptime(time_string, '%d %H:%M:%S')

        stages = buckets.get(time, [0, 0])
        buckets[time] = stages

        stages[0] += 1
        if stage == "R" or stage == "S":
            stages[1] += 1

    quantities = []
    for time, stages in buckets.iteritems():
        quantities.append([time.day, time.hour, time.minute, stages[1] / stages[0]])

    quantities.sort()

    return quantities
def sleep_quantity1(scores):
    quantities = []
    for day, hour, minute in product(range(1, 4), range(24), range(60)):
        sleep = 0
        total = 0
        for (stage, time_string) in scores:
            time = datetime.datetime.strptime(time_string, '%d %H:%M:%S')
            if time.hour == hour and time.day == day and time.minute == minute:
                total += 1
                if stage == 'S' or stage == 'R':
                    sleep += 1
        if total != 0:
            quantities.append([day, hour, minute, sleep / total])
    return quantities

Context

StackExchange Code Review Q#121977, answer score: 3

Revisions (0)

No revisions yet.