patternpythonMinor
Extracting sleep quality scores from periodic readings
Viewed 0 times
qualitysleepperiodicreadingsextractingscoresfrom
Problem
How can I optimize the next function to work faster? The function must prepare a list to put in a CSV file. The list must contain values of average percentage of sleep per minute counted from
The function also must group counted values by hour and day, so the output should be like this:
scores - a list of the form [[stage, time]], where stage could be 'S', 'R' or 'W' and time is in '%d %H:%M:%S' format; it's something like this:scores = [['S', '01 12:11:00'], ['S', '01 12:11:20'], ['W', '01 12:11:40'], ['S', '01 12:12:00'], ...]The function also must group counted values by hour and day, so the output should be like this:
quantities = [[1, 12, 11, 0.66], [1, 12, 12, 0.33], [1, 12, 13, 1], ...]from itertools import product
import datetime
def sleep_quantity(scores):
global quantities
quantities = []
for day, hour, minute in product(range(1,4), range(24), range(60)):
sleep = 0
total = 0
quantity = []
for line in scores:
time = datetime.datetime.strptime(line[1], '%d %H:%M:%S')
stage = line[0]
if time.hour == hour and time.day == day and time.minute == minute:
total += 1
if stage == 'S' or stage == 'R':
sleep += 1
if total != 0:
sleep_per_minute = sleep / total
quantity.append(hour)
quantity.append(day)
quantity.append(minute)
quantity.append(sleep_per_minute)
quantities.append(quantity) #for export to CSV
return quantitiesSolution
First of all the
cleared at the start and the value is returned from the function too,
no need to clobber a global.
The way
immediately appending a literal list instead of updating a variable:
Now just another quick suggestion is to use destructuring in the inner
Finally, the ordering of hour, day, minute in the result means that the
normal
sensible manner - I'd suggest to put it in decreasing order, i.e. day,
hour, minute.
Edit: Was updated before I ended up posting, but my assumptions still hold I think. (Unfortunately there's not much of an explanation of why this function does what it does.)
However it's immediately a concern that the way the iterations are set
up is less efficient than the other way round: the
that sounds like they should be nested the other way round.
Given that there's no explanation of what the code is supposed to do I'm
going to guess a bit on the semantics. Now we're iterating over three
days in one minute steps, checking whether any of the elements in
the matches for that moment in time and if we got one, we append to the
That means, I think, that you're looking for all timestamps in the list,
which occur at the same moment, then add up the "type" of the timestamp
and append one result for each moment where there were any matches.
That can be written much better in terms of sorting (the timestamps) and
accumulating consecutive ranges instead:
The process is thus: accumulate entries into buckets, one bucket for
each moment and at the same time updating the numbers for that bucket,
then at the end, iterating over all buckets and producing the result
list and, since the order is arbitrary, sort them again so it matches
the output of the previous approach.
Now if you want, you can still use
list in place and use
Oh and run a profiler.
For reference, I ended up comparing it with the following code:
quantities global sounds unnecessary. The value iscleared at the start and the value is returned from the function too,
no need to clobber a global.
The way
quantity is updated can be written much more concisely byimmediately appending a literal list instead of updating a variable:
quantities.append([hour, day, minute, sleep / total])Now just another quick suggestion is to use destructuring in the inner
for loop too:for (stage, time_string) in scores:Finally, the ordering of hour, day, minute in the result means that the
normal
sort method on a list isn't sufficient to sort the list in asensible manner - I'd suggest to put it in decreasing order, i.e. day,
hour, minute.
Edit: Was updated before I ended up posting, but my assumptions still hold I think. (Unfortunately there's not much of an explanation of why this function does what it does.)
However it's immediately a concern that the way the iterations are set
up is less efficient than the other way round: the
product list is4320 elements long, while the scores list is likely much shorter -that sounds like they should be nested the other way round.
Given that there's no explanation of what the code is supposed to do I'm
going to guess a bit on the semantics. Now we're iterating over three
days in one minute steps, checking whether any of the elements in
scores match and do something if so. For total, we accumulate allthe matches for that moment in time and if we got one, we append to the
quantities list.That means, I think, that you're looking for all timestamps in the list,
which occur at the same moment, then add up the "type" of the timestamp
and append one result for each moment where there were any matches.
That can be written much better in terms of sorting (the timestamps) and
accumulating consecutive ranges instead:
def sleep_quantity2(scores):
buckets = {}
for stage, time_string in scores:
time = datetime.datetime.strptime(time_string, '%d %H:%M:%S')
stages = buckets.get(time, [0, 0])
buckets[time] = stages
stages[0] += 1
if stage == "R" or stage == "S":
stages[1] += 1
quantities = []
for time, stages in buckets.iteritems():
quantities.append([time.day, time.hour, time.minute, stages[1] / stages[0]])
quantities.sort()
return quantitiesThe process is thus: accumulate entries into buckets, one bucket for
each moment and at the same time updating the numbers for that bucket,
then at the end, iterating over all buckets and producing the result
list and, since the order is arbitrary, sort them again so it matches
the output of the previous approach.
Now if you want, you can still use
yield instead of accumulating thelist in place and use
xrange instead of range in Python 2.7.Oh and run a profiler.
For reference, I ended up comparing it with the following code:
def sleep_quantity1(scores):
quantities = []
for day, hour, minute in product(range(1, 4), range(24), range(60)):
sleep = 0
total = 0
for (stage, time_string) in scores:
time = datetime.datetime.strptime(time_string, '%d %H:%M:%S')
if time.hour == hour and time.day == day and time.minute == minute:
total += 1
if stage == 'S' or stage == 'R':
sleep += 1
if total != 0:
quantities.append([day, hour, minute, sleep / total])
return quantitiesCode Snippets
quantities.append([hour, day, minute, sleep / total])for (stage, time_string) in scores:def sleep_quantity2(scores):
buckets = {}
for stage, time_string in scores:
time = datetime.datetime.strptime(time_string, '%d %H:%M:%S')
stages = buckets.get(time, [0, 0])
buckets[time] = stages
stages[0] += 1
if stage == "R" or stage == "S":
stages[1] += 1
quantities = []
for time, stages in buckets.iteritems():
quantities.append([time.day, time.hour, time.minute, stages[1] / stages[0]])
quantities.sort()
return quantitiesdef sleep_quantity1(scores):
quantities = []
for day, hour, minute in product(range(1, 4), range(24), range(60)):
sleep = 0
total = 0
for (stage, time_string) in scores:
time = datetime.datetime.strptime(time_string, '%d %H:%M:%S')
if time.hour == hour and time.day == day and time.minute == minute:
total += 1
if stage == 'S' or stage == 'R':
sleep += 1
if total != 0:
quantities.append([day, hour, minute, sleep / total])
return quantitiesContext
StackExchange Code Review Q#121977, answer score: 3
Revisions (0)
No revisions yet.