HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Harmonic analysis of time series applied to arrays

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
harmonicarraysanalysistimeappliedseries

Problem

I've developed some code in Python to apply a harmonic analysis of timeseries (for satellite imagery data). It's based on this, but then I would like to optimize the performance. So instead of a single timeseries as input I've an array of 10000 timeseries as input.

Even though I've tried to make use of some nice functions in NumPy that deals with big multidimensional arrays, I'm sure I just have touched the surface as it comes to the actual capabilities.

```
import numpy as np

# Computing diagonal for each row of a 2d array. See: http://stackoverflow.com/q/27214027/2459096
def makediag3d(M):
b = np.zeros((M.shape[0], M.shape[1]*M.shape[1]))
b[:, ::M.shape[1]+1] = M
return b.reshape(M.shape[0], M.shape[1], M.shape[1])

# Function to apply the Harmonic analysis of time series applied to arrays
def HANTS(ni,y,nf=3,HiLo='Hi',low=0.,high=255,fet=5,delta=0.1):
"""
ni = nr. of images (total number of actual samples of the time series)
nb = length of the base period, measured in virtual samples
(days, dekads, months, etc.)
nf = number of frequencies to be considered above the zero frequency
y = array of input sample values (e.g. NDVI values)
ts = array of size ni of time sample indicators
(indicates virtual sample number relative to the base period);
numbers in array ts maybe greater than nb
If no aux file is used (no time samples), we assume ts(i)= i,
where i=1, ..., ni
HiLo = 2-character string indicating rejection of high or low outliers
select from 'Hi', 'Lo' or 'None'
low = valid range minimum
high = valid range maximum (values outside the valid range are rejeced
right away)
fet = fit error tolerance (points deviating more than fet from curve
fit are rejected)
dod = degree of overdeterminedness (iteration stops if number of
points reaches the min

Solution

Well, I'm going to take a break and may not get back to it too soon, but I created https://github.com/pckujawa/harmonic_analysis_of_time_series and an accompanying "journal" to show how I profile and (will) optimize the code. For now, you can just look at each commit in order for my advice :). (BTW, I really recommend High Performance Python - O'Reilly Media - it's good all around, but it hits especially on numpy and scientific computing.)

I used to know numpy pretty well, but I've forgotten a lot. Any experts out there could probably suggest better functions to achieve what you want. But without being a domain expert, it's hard to grok this code.

Some highlights

Profiling shows that ndarray.take is the bottleneck. This is happening twice inside a loop and is where about 64% of the time is spent. Next are calls to einsum which take about 20% of the time. So if we can find different ways to express those, we'll see big wins.

*** PROFILER RESULTS ***
HANTS (hants/__init__.py:12)
function called 1 times

         1384 function calls in 11.479 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 48 to 40 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.312    1.312   11.479   11.479 __init__.py:12(HANTS)
       83    7.402    0.089    7.402    0.089 {method 'take' of 'numpy.ndarray' objects}
       64    2.293    0.036    2.293    0.036 {numpy.core.multiarray.einsum}
       16    0.175    0.011    0.257    0.016 __init__.py:6(makediag3d)
       16    0.092    0.006    0.096    0.006 linalg.py:296(solve)

Code Snippets

*** PROFILER RESULTS ***
HANTS (hants/__init__.py:12)
function called 1 times

         1384 function calls in 11.479 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 48 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.312    1.312   11.479   11.479 __init__.py:12(HANTS)
       83    7.402    0.089    7.402    0.089 {method 'take' of 'numpy.ndarray' objects}
       64    2.293    0.036    2.293    0.036 {numpy.core.multiarray.einsum}
       16    0.175    0.011    0.257    0.016 __init__.py:6(makediag3d)
       16    0.092    0.006    0.096    0.006 linalg.py:296(solve)

Context

StackExchange Code Review Q#71489, answer score: 4

Revisions (0)

No revisions yet.