patternpythonMinor
Fitting multiple piecewise functions to data and return functions and derivatives as Fortran code
Viewed 0 times
fortranfittingpiecewisereturnderivativescodemultipleandfunctionsdata
Problem
Background
For a future workshop I'll have to fit arbitrary functions (independent variable is height
Workflow
I don't think 1.-6. can be automated, but the rest should be.
Code
```
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
from sympy import log, ln, Piecewise, lambdify, symbols, sympify, fcode, sqrt
def config(name):
"""Configuration of the piecewise function fitting, dependent on input name
Input:
name... name of experiment to fit data to, basically chooses settings
Output:
var_list... list of variables to fit
infunc_list_dict... dictionary with var_list as keys, each having a list as
value that contains strings with the sub-function to fit, from
the bottom up. Only the first (lowest) may have a constant value, all
others must be 0 at the height they "take over" (where their
For a future workshop I'll have to fit arbitrary functions (independent variable is height
z) to data from multiple sources (output of different numerical weather prediction models) in a yet unknown format (but basically gridded height/value pairs). The functions only have to interpolate the data and be differentiable. There should explicitly be no theoretical background for the type of function, but they should be smooth. The goal is to use the gridded (meaning discrete) output of the numerical weather prediction model in our pollutant dispersion model, which requires continuous functions.Workflow
- choose the input model
- load input data
- define list of variables (not necessarily always the same)
- define height ranges (for the piecewise function)
- define base functions like "a0 + a1*z" for each height range and variable
- optionally define weights, because some parts are more important that others
- fit the piecewise functions
- save the fitted functions and their derivatives as Fortran 90 free form source code (to be included in our model)
I don't think 1.-6. can be automated, but the rest should be.
Code
```
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
from sympy import log, ln, Piecewise, lambdify, symbols, sympify, fcode, sqrt
def config(name):
"""Configuration of the piecewise function fitting, dependent on input name
Input:
name... name of experiment to fit data to, basically chooses settings
Output:
var_list... list of variables to fit
infunc_list_dict... dictionary with var_list as keys, each having a list as
value that contains strings with the sub-function to fit, from
the bottom up. Only the first (lowest) may have a constant value, all
others must be 0 at the height they "take over" (where their
Solution
Doesn't look too bad IMO, but there's definitely lots of duplication
going on that could be made shorter. Most of the comments are great,
though in some cases I'd recommend longer (not just single-letter)
variable names! I know it's common in scientific code though.
Now, first thing, I'd restructure the main block such that it's easier
to use interactively, e.g. have just
out the configuration itself - if you still want the ability to override
the variables, consider making them optional / use a dictionary or so.
Line 184 gives me an error because the grouping should be explicit - I
don't know what options would change that or if it's a NumPy version
issue, in any case I've added parenthesis so it reads:
Apropos variable names, it took me a moment to decipher
perhaps just at least an an underscore, or have a different name that
makes the word boundaries clearer.
I'd make
matter anymore, it's more self-explanatory interactively and finally if
you have a bunch of nested dictionaries it's also a bit more
straightforward than repeating variable names all over again. E.g.:
Perhaps just move the configurations out into a constant though. If the
test, save and print flags should be the same, have an
Also a lot of the values in the configuration are the same - if they're
not being modified, maybe reuse them?
For
functions for separate functionality, e.g:
with
The
place, I'd
redirect standard output instead.
Buffering to a string and then writing to a file, why? It's probably
not very harmful due to the length of the output, but still, the pattern
isn't that useful (unless I missed some reason here).
Constants like
what if you decide to change the output formatting? I'd look for some
helper library for that, or, failing that, create function for
formatting that has these values as (default) parameters.
Instead of appending the values using a
list (or concatenate the two in case the values allow for that):
The checks for
unused; and in Python 2 consider using
following example I'd also take a look at
it look nicer), e.g.:
In general I'd probably make local variables out of things like
that's just me.
For the
list of arguments spliced in like you're already doing for
e.g.:
If this is too unclear, split it up, or use a
Hope that gives you some ideas around it.
going on that could be made shorter. Most of the comments are great,
though in some cases I'd recommend longer (not just single-letter)
variable names! I know it's common in scientific code though.
Now, first thing, I'd restructure the main block such that it's easier
to use interactively, e.g. have just
main("tmp220") and let it figureout the configuration itself - if you still want the ability to override
the variables, consider making them optional / use a dictionary or so.
Line 184 gives me an error because the grouping should be explicit - I
don't know what options would change that or if it's a NumPy version
issue, in any case I've added parenthesis so it reads:
local_index = (data_z > thresh_list[i]) & (data_z < thresh_list[i + 1])Apropos variable names, it took me a moment to decipher
teston etc.,perhaps just at least an an underscore, or have a different name that
makes the word boundaries clearer.
I'd make
config return a dictionary, then the order of values doesn'tmatter anymore, it's more self-explanatory interactively and finally if
you have a bunch of nested dictionaries it's also a bit more
straightforward than repeating variable names all over again. E.g.:
def config(name):
...
return {
"tmp220": {
"abl_height": 990,
...
},
...
}[name]Perhaps just move the configurations out into a constant though. If the
test, save and print flags should be the same, have an
update({"test_on": ..., ...}) call in there to add the settings.Also a lot of the values in the configuration are the same - if they're
not being modified, maybe reuse them?
For
fit_func and main, I'd split things up more into separatefunctions for separate functionality, e.g:
y_list.append(subs(sympify(func_str) + transition_value,
(t, thresh_list[i]),
(s, transition_value),
(zi, abl_height)))with
subs something like:def subs(expression, *substitutions):
for substitution in substitutions:
expression = expression.subs(*substitution)
return expressionThe
print_on flag, well actually, checking for that all over theplace, I'd
redirect standard output instead.
Buffering to a string and then writing to a file, why? It's probably
not very harmful due to the length of the output, but still, the pattern
isn't that useful (unless I missed some reason here).
Constants like
78, 30 and 48 are probably, you know, constant, butwhat if you decide to change the output formatting? I'd look for some
helper library for that, or, failing that, create function for
formatting that has these values as (default) parameters.
create_deriv could be slightly more compact:def create_deriv(funcname, func):
"""..."""
z = symbols('z')
deriv = func.diff(z)
if funcname != 'v2':
return deriv
return deriv, sqrt(func).diff(z)Instead of appending the values using a
range, simply extend thelist (or concatenate the two in case the values allow for that):
t = [z]
t.extend(a)
# or perhaps
func = lambdify(tuple([z] + list(a)), y_list[i], modules=np)The checks for
nparams > 0 could be more compact (also pcov isunused; and in Python 2 consider using
xrange where possible; in thefollowing example I'd also take a look at
zip/itertools.izip to makeit look nicer), e.g.:
popt = ()
# fit the function to the data, checking for constant function aloft:
if nparams > 0:
popt = curve_fit(func, local_z, local_v, sigma=sigma,
maxfev=niterations)[0]
# substitute fitted parameters in sympy expression:
for j in xrange(nparams):
y_list[i] = y_list[i].subs(a[j], popt[j])
# calculate the new transition_value:
transition_value = func(thresh_list[i + 1], *popt)In general I'd probably make local variables out of things like
y_list[i] if it's always the same thing you're manipulating, butthat's just me.
For the
Piecewise creation you can simply call the function with thelist of arguments spliced in like you're already doing for
config,e.g.:
if len(y_list) == 1:
y = y_list[0]
else:
y = Piecewise(*[(y_j, z <= thresh_j_1)
for y_j, thresh_j_1
in zip(y_list, thresh_list[1:])]
+ [(y_list[-1], True)])If this is too unclear, split it up, or use a
range again.Hope that gives you some ideas around it.
Code Snippets
local_index = (data_z > thresh_list[i]) & (data_z < thresh_list[i + 1])def config(name):
...
return {
"tmp220": {
"abl_height": 990,
...
},
...
}[name]y_list.append(subs(sympify(func_str) + transition_value,
(t, thresh_list[i]),
(s, transition_value),
(zi, abl_height)))def subs(expression, *substitutions):
for substitution in substitutions:
expression = expression.subs(*substitution)
return expressiondef create_deriv(funcname, func):
"""..."""
z = symbols('z')
deriv = func.diff(z)
if funcname != 'v2':
return deriv
return deriv, sqrt(func).diff(z)Context
StackExchange Code Review Q#159432, answer score: 2
Revisions (0)
No revisions yet.