HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Speeding up Python date conversion function currently using list comprehension and datetime

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
conversioncomprehensionandfunctiondatepythonusingspeedinglistcurrently

Problem

I am reading a large data file where the time is given in number of days since some epoch. I am currently converting this to Python's datetime format using this function:

import datetime as dt
def days2dt(days_since_epoch):
    epoch = dt.datetime(1980, 1, 6)
    datelist = [epoch + dt.timedelta(days=x) for x in days_since_epoch]
    return datelist

# run with sample data (might be larger in real life, in worst case multiply
# the list by 40 instead of 6)
import numpy as np
sample = list(np.arange(0, 3/24., 1/24./3600./50.))*6
dates = days2dt(sample)


Running this function takes 5x longer than reading the entire file using pandas.read_csv() (perhaps because the listcomp performs an addition for each element). The returned list is used immediately as the index of the pandas DataFrame, though interestingly, using a generator expression instead of a listcomp as above improves performance by ~35% (why?).

Aside from using a generator expression, can the performance of this function be improved in any way, e.g. by not performing this date conversion per-element or by using some NumPy feature I'm not aware of?

Solution

You should try Numpy's datetime and timedelta support:

def days2dt(days_since_epoch):
    microseconds = np.around(np.asarray(days_since_epoch) * (24*60*60*10**6))
    return np.datetime64('1980-01-06') + microseconds.astype('timedelta64[us]')


I suggest you read the units section of the docs to make sure this is safe (both resolution and min/max dates), but it should be fine.

Note that 90% of the time taken for days2dt is converting the input list to a numpy.array; if you pass in a numpy.array it goes much faster. Nevertheless, this is significantly faster than the list comprehension already.

Code Snippets

def days2dt(days_since_epoch):
    microseconds = np.around(np.asarray(days_since_epoch) * (24*60*60*10**6))
    return np.datetime64('1980-01-06') + microseconds.astype('timedelta64[us]')

Context

StackExchange Code Review Q#77582, answer score: 4

Revisions (0)

No revisions yet.