HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Fitting multiple piecewise functions to data and return functions and derivatives as Fortran code

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
fortranfittingpiecewisereturnderivativescodemultipleandfunctionsdata

Problem

Background

For a future workshop I'll have to fit arbitrary functions (independent variable is height z) to data from multiple sources (output of different numerical weather prediction models) in a yet unknown format (but basically gridded height/value pairs). The functions only have to interpolate the data and be differentiable. There should explicitly be no theoretical background for the type of function, but they should be smooth. The goal is to use the gridded (meaning discrete) output of the numerical weather prediction model in our pollutant dispersion model, which requires continuous functions.

Workflow

  • choose the input model



  • load input data



  • define list of variables (not necessarily always the same)



  • define height ranges (for the piecewise function)



  • define base functions like "a0 + a1*z" for each height range and variable



  • optionally define weights, because some parts are more important that others



  • fit the piecewise functions



  • save the fitted functions and their derivatives as Fortran 90 free form source code (to be included in our model)



I don't think 1.-6. can be automated, but the rest should be.

Code

```
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
from sympy import log, ln, Piecewise, lambdify, symbols, sympify, fcode, sqrt

def config(name):
"""Configuration of the piecewise function fitting, dependent on input name

Input:
name... name of experiment to fit data to, basically chooses settings
Output:
var_list... list of variables to fit
infunc_list_dict... dictionary with var_list as keys, each having a list as
value that contains strings with the sub-function to fit, from
the bottom up. Only the first (lowest) may have a constant value, all
others must be 0 at the height they "take over" (where their

Solution

Doesn't look too bad IMO, but there's definitely lots of duplication
going on that could be made shorter. Most of the comments are great,
though in some cases I'd recommend longer (not just single-letter)
variable names! I know it's common in scientific code though.

Now, first thing, I'd restructure the main block such that it's easier
to use interactively, e.g. have just main("tmp220") and let it figure
out the configuration itself - if you still want the ability to override
the variables, consider making them optional / use a dictionary or so.

Line 184 gives me an error because the grouping should be explicit - I
don't know what options would change that or if it's a NumPy version
issue, in any case I've added parenthesis so it reads:

local_index = (data_z > thresh_list[i]) & (data_z < thresh_list[i + 1])


Apropos variable names, it took me a moment to decipher teston etc.,
perhaps just at least an an underscore, or have a different name that
makes the word boundaries clearer.

I'd make config return a dictionary, then the order of values doesn't
matter anymore, it's more self-explanatory interactively and finally if
you have a bunch of nested dictionaries it's also a bit more
straightforward than repeating variable names all over again. E.g.:

def config(name):
    ...
    return {
        "tmp220": {
            "abl_height": 990,
            ...
        },
        ...
    }[name]


Perhaps just move the configurations out into a constant though. If the
test, save and print flags should be the same, have an
update({"test_on": ..., ...}) call in there to add the settings.

Also a lot of the values in the configuration are the same - if they're
not being modified, maybe reuse them?

For fit_func and main, I'd split things up more into separate
functions for separate functionality, e.g:

y_list.append(subs(sympify(func_str) + transition_value,
                       (t, thresh_list[i]),
                       (s, transition_value),
                       (zi, abl_height)))


with subs something like:

def subs(expression, *substitutions):
    for substitution in substitutions:
        expression = expression.subs(*substitution)
    return expression


The print_on flag, well actually, checking for that all over the
place, I'd
redirect standard output instead.

Buffering to a string and then writing to a file, why? It's probably
not very harmful due to the length of the output, but still, the pattern
isn't that useful (unless I missed some reason here).

Constants like 78, 30 and 48 are probably, you know, constant, but
what if you decide to change the output formatting? I'd look for some
helper library for that, or, failing that, create function for
formatting that has these values as (default) parameters.

create_deriv could be slightly more compact:

def create_deriv(funcname, func):
    """..."""
    z = symbols('z')
    deriv = func.diff(z)
    if funcname != 'v2':
        return deriv
    return deriv, sqrt(func).diff(z)


Instead of appending the values using a range, simply extend the
list (or concatenate the two in case the values allow for that):

t = [z]
t.extend(a)
# or perhaps
func = lambdify(tuple([z] + list(a)), y_list[i], modules=np)


The checks for nparams > 0 could be more compact (also pcov is
unused; and in Python 2 consider using xrange where possible; in the
following example I'd also take a look at zip/itertools.izip to make
it look nicer), e.g.:

popt = ()
# fit the function to the data, checking for constant function aloft:
if nparams > 0:
    popt = curve_fit(func, local_z, local_v, sigma=sigma,
                  maxfev=niterations)[0]

    # substitute fitted parameters in sympy expression:
    for j in xrange(nparams):
        y_list[i] = y_list[i].subs(a[j], popt[j])

# calculate the new transition_value:
transition_value = func(thresh_list[i + 1], *popt)


In general I'd probably make local variables out of things like
y_list[i] if it's always the same thing you're manipulating, but
that's just me.

For the Piecewise creation you can simply call the function with the
list of arguments spliced in like you're already doing for config,
e.g.:

if len(y_list) == 1:
    y = y_list[0]
else:
    y = Piecewise(*[(y_j, z <= thresh_j_1)
                    for y_j, thresh_j_1
                    in zip(y_list, thresh_list[1:])]
                  + [(y_list[-1], True)])


If this is too unclear, split it up, or use a range again.

Hope that gives you some ideas around it.

Code Snippets

local_index = (data_z > thresh_list[i]) & (data_z < thresh_list[i + 1])
def config(name):
    ...
    return {
        "tmp220": {
            "abl_height": 990,
            ...
        },
        ...
    }[name]
y_list.append(subs(sympify(func_str) + transition_value,
                       (t, thresh_list[i]),
                       (s, transition_value),
                       (zi, abl_height)))
def subs(expression, *substitutions):
    for substitution in substitutions:
        expression = expression.subs(*substitution)
    return expression
def create_deriv(funcname, func):
    """..."""
    z = symbols('z')
    deriv = func.diff(z)
    if funcname != 'v2':
        return deriv
    return deriv, sqrt(func).diff(z)

Context

StackExchange Code Review Q#159432, answer score: 2

Revisions (0)

No revisions yet.