HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Help improve my my python code for Exponential Smoothing Sumitted to Statsmodel

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
statsmodelsumittedexponentialhelpimprovesmoothingforpythoncode

Problem

I just submitted this pull request to statsmodels (a Python toolkit for statistical and econometric modeling). It adds support for exponential smoothing of time series.

I have only been programming for a few months. I am not very good with Numpy, and just learning classes and decorators, so any help would be appreciated.

The main part of the code is below. Can anyone help with shortening this so it is more efficient? Especially around the for loop. The complete file is here.

```
import numpy as np
import statsmodels.tools.eval_measures as em

#debug use
numpy.set_printoptions(suppress=True)

def exp_smoothing(y, alpha, gamma, delta=0, cycle=None, damp=1, initial=None,
trend='additive', forecast=None, season='additive', output='data'):
'''
Exponential Smoothing
This function handles 15 different Standard Exponential Smoothing models

Parameters
----------
y: array
Time series data

alpha: float
Smoothing factor for data between 0 and 1.

gamma: non zero integer or float
Smoothing factor for trend generally between 0 and 1

delta: non zero integer or float
Smoothing factor for season generally between 0 and 1

cycle: int
Length of cycles in a season. (ie: 12 for months, 4 for quarters)

damp: non zero integer or float {default = 1}
Autoregressive or damping parameter specifies a rate of decay
in the trend. Generally 0>d>1

initial: str, {'3avg'}(Optional)
Indicate initial point for bt and y
default: bt = y[0]-y[1], st = y[0]
3avg: Yields the average of the first 3 differences for bt.

trend: str, {'additive','multiplicative', 'brown'}
Indicate model type of trend default is 'additive'

additive: uses additive models such as Holt's linear & Damped-Trend
Linear Exponential Smoothing. Generalized as:
s_t = a y_t + (1-a) (s_t-1 + b_t-1)
b_t = g *(s_t - s_t-1)

Solution

Some quick comments just based on reading the documentation and the first few lines:

-
Typos, inconsistencies, and omissions:

  • The first line should be a summary of the behaviour of the function, not a title, so something like "Exponentially smooth a time series".



  • There's no need for capital letters in "Standard Exponential Smoothing" or in "Damped-Trend Linear Exponential Smoothing".



  • There's an extra space in "alpha:  float".



  • In "Smoothing factor for data between 0 and 1" there needs to be punctuation between "data" and "between". (A similar comment applies to some other parameters.)



  • Does "between 0 and 1" include the endpoints or not?



  • Sometimes you have a full stop at the end of sentences and sometimes you don't.



  • "non-zero" needs a hyphen.



  • "Length of cycles in a season" should be "Number of cycles in a season".



  • Write "i.e.," instead of "ie:", but in this case I think "for example," would be better.



  • What does "generally between 0 and 1" mean?



  • "0>d>1" is an impossible condition.



  • What is "bt"?



  • "Indicate model type of trend default is 'additive'" make no sense in English.



  • "used to deal with the special cases" — deal with them how?



  • "Number of periods ahead" — ahead of what?



  • "Not implemented" — suggests that this code is not ready for merge.



-
You write, "This function is able to perform the following algorithms" but you don't say how. It would be worth explaining what parameters to pass in order to get each of these behaviours.

-
I don't like the way you've used the word "season". In English a "season" is one fourth of a year, or, metaphorically, one part of a cycle. But in "Length of cycles in a season" you appear to be using the word to mean the whole cycle. This can only be confusing to the reader.

-
You've left in a lot of debugging print statements. But these should surely be removed before issuing the pull request. It looks as though it would be worth your while learning to use Python's debugging facilities, then you wouldn't need to use debugging print statements.

-
alpha needs to be "between 0 and 1", but all you actually check is alpha == 0.

-
Some of the other parameters have restricted values. For example, trend must be 'additive', 'multiplicative', or 'brown', but you don't actually check its value.

-
Generally it's best to check all your parameters first, rather than doing half the computation and then checking some of the parameters.

-
You asked whether "something like this [would] be better for a class?" A class represents a group of things with common behaviour. But you don't seem to have any thing here, so there's no need for a class.

Context

StackExchange Code Review Q#38563, answer score: 2

Revisions (0)

No revisions yet.