patternpythonMinor
Calculate middle values faster
Viewed 0 times
valuesfastercalculatemiddle
Problem
I want to improve the performance of this code:
```
#!/usr/bin/env python
# -- coding: utf-8 --
import numpy
import random
import math
random.seed(42)
class TrainModel:
def __init__(cls, scheduling_interval, trains):
cls.scheduling_interval = scheduling_interval
cls.trains = trains
class Train:
def __init__( cls, legs):
cls.legs = legs
class Leg: # Teilfahrt
def __init__(cls, travel_time, earliest_departure_time, latest_departure_time,
power_profile):
cls.travel_time = travel_time # Fahrzeit
cls.earliest_departure_time = earliest_departure_time # Früheste Abfraht
cls.latest_departure_time = latest_departure_time # Späteste Abfahrt
cls.power_profile = power_profile # Leistungsprofil
# Test model
scheduling_interval = 60*24 # one day in minutes
train_number = 10
trains = []
for i in xrange(train_number):
legs = []
leg_number = random.randint(5,10)
for j in xrange(leg_number):
travel_time = random.randint(5,40)
earliest_departure_time = random.randint(0,scheduling_interval-100)
latest_departure_time = earliest_departure_time+7
power_profile = numpy.random.rand(travel_time*60+1)
legs.append(Leg(travel_time, earliest_departure_time, latest_departure_time, power_profile))
train = Train(legs)
trains.append(train)
train_model = TrainModel(scheduling_interval, trains)
def calc_middle(t, profile):
x_min = int(min((901-1)*(t+1), len(profile)-1))
sum_m = sum(a for a in profile[(901-1)*t+1:x_min] if a > 0)
sum_m += max(0.5 profile[(901-1)t], 0)
sum_m += max(0.5 * profile[x_min], 0)
return sum_m
storage = {}
# Slow loops profiling shows that most of the time is spend in the sum function of calc_middle
number_of_middle_intervals = int(math.ceil((train_model.scheduling_interval*60+1)/901))
for t in xrange(len(train_model.trains)):
for r in xrange(len(train_model.trains[t].legs)):
current_t
```
#!/usr/bin/env python
# -- coding: utf-8 --
import numpy
import random
import math
random.seed(42)
class TrainModel:
def __init__(cls, scheduling_interval, trains):
cls.scheduling_interval = scheduling_interval
cls.trains = trains
class Train:
def __init__( cls, legs):
cls.legs = legs
class Leg: # Teilfahrt
def __init__(cls, travel_time, earliest_departure_time, latest_departure_time,
power_profile):
cls.travel_time = travel_time # Fahrzeit
cls.earliest_departure_time = earliest_departure_time # Früheste Abfraht
cls.latest_departure_time = latest_departure_time # Späteste Abfahrt
cls.power_profile = power_profile # Leistungsprofil
# Test model
scheduling_interval = 60*24 # one day in minutes
train_number = 10
trains = []
for i in xrange(train_number):
legs = []
leg_number = random.randint(5,10)
for j in xrange(leg_number):
travel_time = random.randint(5,40)
earliest_departure_time = random.randint(0,scheduling_interval-100)
latest_departure_time = earliest_departure_time+7
power_profile = numpy.random.rand(travel_time*60+1)
legs.append(Leg(travel_time, earliest_departure_time, latest_departure_time, power_profile))
train = Train(legs)
trains.append(train)
train_model = TrainModel(scheduling_interval, trains)
def calc_middle(t, profile):
x_min = int(min((901-1)*(t+1), len(profile)-1))
sum_m = sum(a for a in profile[(901-1)*t+1:x_min] if a > 0)
sum_m += max(0.5 profile[(901-1)t], 0)
sum_m += max(0.5 * profile[x_min], 0)
return sum_m
storage = {}
# Slow loops profiling shows that most of the time is spend in the sum function of calc_middle
number_of_middle_intervals = int(math.ceil((train_model.scheduling_interval*60+1)/901))
for t in xrange(len(train_model.trains)):
for r in xrange(len(train_model.trains[t].legs)):
current_t
Solution
Running a profiler over your code shows that most of the time is spent in this line in
So if we want to make this go faster, we need to speed up this line. Some suggestions:
-
Pre-filter the negative elements with
The same array is coming through this function many times – wouldn’t it be cheaper to filter out negative elements just once?
We always get
before entering the
This lets you reduce
There's a nice little saving.
-
Use
Since the slice is now a pure numpy array, rather than a Python generator expression, we can call
Reprofiling the new code highlights another line which is chewing up cycles: computing
You can make a few tweaks for speed here:
Knowing that this line is quite a sink, here’s a rewrite of the innermost for loop that tries to speed it up:
There are probably more optimisations to be made; this is just based on an hour or so poking around before bed. In informal testing, it takes the time to run with
I thought about it a bit more this morning. Here are a few more things I found. None of these are as significant, but they’re useful savings.
-
If you compute
rather than
I get a small saving. Saves about 0.2s when running with
-
Since
4.5s ~> 4.12s.
calc_middle():sum_m = sum(a for a in profile[(901-1)*t+1:x_min] if a > 0)So if we want to make this go faster, we need to speed up this line. Some suggestions:
-
Pre-filter the negative elements with
numpy.clip().The same array is coming through this function many times – wouldn’t it be cheaper to filter out negative elements just once?
We always get
full_profile being passed in here, so addfull_profile = full_profile.clip(0)before entering the
for i in xrange(number_of_middle_intervals) loop.This lets you reduce
calc_middle to:def calc_middle(t, profile):
x_min = int(min((901-1)*(t+1), len(profile)-1))
profile = profile.clip(0)
return sum(profile[(901-1)*t:x_min+1])There's a nice little saving.
-
Use
numpy.sum().Since the slice is now a pure numpy array, rather than a Python generator expression, we can call
numpy.sum() here for extra speed:def calc_middle(t, profile):
x_min = int(min((901-1)*(t+1), len(profile)-1))
profile = profile.clip(0)
return numpy.sum(profile[(901-1)*t:x_min+1])Reprofiling the new code highlights another line which is chewing up cycles: computing
x_min, e.g.:x_min = int(min((901-1)*(t+1), len(profile)-1))You can make a few tweaks for speed here:
- Reuse values where possible
- Don't unnecessarily cast to
int()
- Only compute this value if you actually need to
Knowing that this line is quite a sink, here’s a rewrite of the innermost for loop that tries to speed it up:
x_start = 900 * i
if x_start > dim_end:
tmp_sum = 0
else:
x_min = min(901 * i, len(full_profile)-1)
if x_min < dim_start:
tmp_sum = 0
else:
tmp_sum = numpy.sum(full_profile[x_start:x_min+1])There are probably more optimisations to be made; this is just based on an hour or so poking around before bed. In informal testing, it takes the time to run with
train_number=24 from ~2.4s to ~0.6s. Better, but probably not yet suitable for 24000 trains.I thought about it a bit more this morning. Here are a few more things I found. None of these are as significant, but they’re useful savings.
-
If you compute
x_start as:x_start = 0
for i in xrange(number_of_middle_intervals):
# do stuff
x_start += 900rather than
for i in xrange(number_of_middle_intervals):
x_start = 900 * iI get a small saving. Saves about 0.2s when running with
train_number=200 (from 4.69s to 4.5s).-
Since
x_start is increasing with every iteration of the loop, when you get x_start > dim_end, you know that this will always be true for the current value of the j-loop. So you can fill in the rest of power_value_per_middle_interval with zeroes:if x_start > dim_end:
remaining_length = len(power_value_per_middle_intervall) - number_of_middle_intervals
power_value_per_middle_intervall += [0] * remaining_length
break4.5s ~> 4.12s.
Code Snippets
sum_m = sum(a for a in profile[(901-1)*t+1:x_min] if a > 0)full_profile = full_profile.clip(0)def calc_middle(t, profile):
x_min = int(min((901-1)*(t+1), len(profile)-1))
profile = profile.clip(0)
return sum(profile[(901-1)*t:x_min+1])def calc_middle(t, profile):
x_min = int(min((901-1)*(t+1), len(profile)-1))
profile = profile.clip(0)
return numpy.sum(profile[(901-1)*t:x_min+1])x_min = int(min((901-1)*(t+1), len(profile)-1))Context
StackExchange Code Review Q#114853, answer score: 7
Revisions (0)
No revisions yet.