patternpythonMinor
Speeding up Python date conversion function currently using list comprehension and datetime
Viewed 0 times
conversioncomprehensionandfunctiondatepythonusingspeedinglistcurrently
Problem
I am reading a large data file where the time is given in number of days since some epoch. I am currently converting this to Python's datetime format using this function:
Running this function takes 5x longer than reading the entire file using
Aside from using a generator expression, can the performance of this function be improved in any way, e.g. by not performing this date conversion per-element or by using some NumPy feature I'm not aware of?
import datetime as dt
def days2dt(days_since_epoch):
epoch = dt.datetime(1980, 1, 6)
datelist = [epoch + dt.timedelta(days=x) for x in days_since_epoch]
return datelist
# run with sample data (might be larger in real life, in worst case multiply
# the list by 40 instead of 6)
import numpy as np
sample = list(np.arange(0, 3/24., 1/24./3600./50.))*6
dates = days2dt(sample)Running this function takes 5x longer than reading the entire file using
pandas.read_csv() (perhaps because the listcomp performs an addition for each element). The returned list is used immediately as the index of the pandas DataFrame, though interestingly, using a generator expression instead of a listcomp as above improves performance by ~35% (why?).Aside from using a generator expression, can the performance of this function be improved in any way, e.g. by not performing this date conversion per-element or by using some NumPy feature I'm not aware of?
Solution
You should try Numpy's
I suggest you read the units section of the docs to make sure this is safe (both resolution and min/max dates), but it should be fine.
Note that 90% of the time taken for
datetime and timedelta support:def days2dt(days_since_epoch):
microseconds = np.around(np.asarray(days_since_epoch) * (24*60*60*10**6))
return np.datetime64('1980-01-06') + microseconds.astype('timedelta64[us]')I suggest you read the units section of the docs to make sure this is safe (both resolution and min/max dates), but it should be fine.
Note that 90% of the time taken for
days2dt is converting the input list to a numpy.array; if you pass in a numpy.array it goes much faster. Nevertheless, this is significantly faster than the list comprehension already.Code Snippets
def days2dt(days_since_epoch):
microseconds = np.around(np.asarray(days_since_epoch) * (24*60*60*10**6))
return np.datetime64('1980-01-06') + microseconds.astype('timedelta64[us]')Context
StackExchange Code Review Q#77582, answer score: 4
Revisions (0)
No revisions yet.