patternMinor
Terry Feagin's 10th order explicit Runge-Kutta method
Viewed 0 times
ordermethodterryexplicitfeaginrungekutta10th
Problem
The following Julia code implements Terry Feagin's 10th order explicit Runge-Kutta method (a more accurate cousin of RK4). Though the structure of the code is quite simple (i.e. no cyclomatic complexity), it involves many high-precision coefficients and lengthy arithmetic expressions which bring its length to over 150 lines. The coefficients display no obvious pattern and cannot be computed at runtime.
With that said, are there any ways to simplify this function?
```
function rk108_step(f, x, y, h)
const a0100 = BigFloat(1//10)
const a0200 = BigFloat("-0.915176561375291440520015019275342154318951387664369720564660")
const a0201 = BigFloat("1.45453440217827322805250021715664459117622483736537873607016")
const a0300 = BigFloat("0.202259190301118170324681949205488413821477543637878380814562")
const a0302 = BigFloat("0.606777570903354510974045847616465241464432630913635142443687")
const a0400 = BigFloat("0.184024714708643575149100693471120664216774047979591417844635")
const a0402 = BigFloat("0.197966831227192369068141770510388793370637287463360401555746")
const a0403 = BigFloat("-0.0729547847313632629185146671595558023015011608914382961421311")
const a0500 = BigFloat("0.0879007340206681337319777094132125475918886824944548534041378")
const a0503 = BigFloat("0.410459702520260645318174895920453426088035325902848695210406")
const a0504 = BigFloat("0.482713753678866489204726942976896106809132737721421333413261")
const a0600 = BigFloat("0.0859700504902460302188480225945808401411132615636600222593880")
const a0603 = BigFloat("0.330885963040722183948884057658753173648240154838402033448632")
const a0604 = BigFloat("0.489662957309450192844507011135898201178015478433790097210790")
const a0605 = BigFloat("-0.0731856375070850736789057580558988816340355615025188195854775")
const a0700 = BigFloat("0.120930449125333720660378854927668953958938996999703678812621")
const a0704 = BigFloat("0.260124675758295622809007617838335174368108756484693361887
With that said, are there any ways to simplify this function?
```
function rk108_step(f, x, y, h)
const a0100 = BigFloat(1//10)
const a0200 = BigFloat("-0.915176561375291440520015019275342154318951387664369720564660")
const a0201 = BigFloat("1.45453440217827322805250021715664459117622483736537873607016")
const a0300 = BigFloat("0.202259190301118170324681949205488413821477543637878380814562")
const a0302 = BigFloat("0.606777570903354510974045847616465241464432630913635142443687")
const a0400 = BigFloat("0.184024714708643575149100693471120664216774047979591417844635")
const a0402 = BigFloat("0.197966831227192369068141770510388793370637287463360401555746")
const a0403 = BigFloat("-0.0729547847313632629185146671595558023015011608914382961421311")
const a0500 = BigFloat("0.0879007340206681337319777094132125475918886824944548534041378")
const a0503 = BigFloat("0.410459702520260645318174895920453426088035325902848695210406")
const a0504 = BigFloat("0.482713753678866489204726942976896106809132737721421333413261")
const a0600 = BigFloat("0.0859700504902460302188480225945808401411132615636600222593880")
const a0603 = BigFloat("0.330885963040722183948884057658753173648240154838402033448632")
const a0604 = BigFloat("0.489662957309450192844507011135898201178015478433790097210790")
const a0605 = BigFloat("-0.0731856375070850736789057580558988816340355615025188195854775")
const a0700 = BigFloat("0.120930449125333720660378854927668953958938996999703678812621")
const a0704 = BigFloat("0.260124675758295622809007617838335174368108756484693361887
Solution
I have optimized Feagin solvers in DifferentialEquations.jl. You can find the codes starting here. I know your question was about simplifying the code, but after really optimizing them, I believe that is the wrong direction (except for teaching purposes. For teaching, just put A in a sparse matrix, b,c as vectors, and go with it).
The code needs to be almost fully unrolled for a few reasons. First of all, if you test static arrays/matrices for the coefficients of this size (via StaticArrays.jl, they are not good at staying stack allocated. The reason is because they are more larger than what you want to always keep on the stack. This is exacerbated by the fact that A is sparse (even in the lower triangular part that), and so using a full matrix would have large amounts of memory overhead. So you want to keep at least a as stand-alone variables. b and c are easily put into a StaticArray. That simplifies things a little bit. (I tested and it's the same as stand-alone variables. I haven't done this yet on master because it's only v0.5 and I want to keep v0.4 compatibility right now).
But now towards optimization. As @MattB noted, the first thing is to deal with the constants. They should be moved outside of this inner function. Also you should take some pre-caution to make sure that all of the types match (to make the code more general. Maybe you will want to use ArbFloats for better performance?). To add this robustness, you just generate all of the constants inside another function, and you can parse them to a given type
Just moving the constants out of the loop should gives you orders of magnitude speedup.
If your
A bit more fine-tuned of an optimization is to now reduce the temporaries. Anytime you have a vectorized operation like
The result of that optimization is an utter mess. Refer to the DifferentialEquations.jl source code again. However timings show that even for small
So the resulting code is no simpler, but it's much faster! Note that DifferentialEquations.jl has these hand-unrolled devecotorized implementations of also the order 12 and 14 methods if you'd like to take a look. It gets quite tedious to do, but sometimes you have to do it if you want the most optimal code.
The code needs to be almost fully unrolled for a few reasons. First of all, if you test static arrays/matrices for the coefficients of this size (via StaticArrays.jl, they are not good at staying stack allocated. The reason is because they are more larger than what you want to always keep on the stack. This is exacerbated by the fact that A is sparse (even in the lower triangular part that), and so using a full matrix would have large amounts of memory overhead. So you want to keep at least a as stand-alone variables. b and c are easily put into a StaticArray. That simplifies things a little bit. (I tested and it's the same as stand-alone variables. I haven't done this yet on master because it's only v0.5 and I want to keep v0.4 compatibility right now).
But now towards optimization. As @MattB noted, the first thing is to deal with the constants. They should be moved outside of this inner function. Also you should take some pre-caution to make sure that all of the types match (to make the code more general. Maybe you will want to use ArbFloats for better performance?). To add this robustness, you just generate all of the constants inside another function, and you can parse them to a given type
T (inside that function, so that way in your main function they are always type T to avoid type instabilities) withT(parse(BigFloat,"0.105352113571753019691496032887878162227673083080523884041670"))Just moving the constants out of the loop should gives you orders of magnitude speedup.
If your
y is 1-dimension (i.e. just a number), then the last optimization I could find is to group the variables in long summations by 4's. For example, (a+b+c+d)+(e+f+g+h)+... instead of a+b+c+d+e+f+g+h+... The reason is due to a compiler inefficiency. Putting these parenthesis around can increase the speed of the code by a few times.A bit more fine-tuned of an optimization is to now reduce the temporaries. Anytime you have a vectorized operation like
A+B it stores the result in a temporary variable. So in the case where y and all of the k's are vectors, there are a lot of temporaries that are allocated each step. First you can get rid of some by making f an in-place function which takes in a (du,u,t) and doesn't return the value, but instead writes it into du. In the spot of the du you will put the k's. Once that's in-place, then you will want to de-vectorize all of the large summations. I created a Gist where I tested which is the fastest way to do this. The winner is to do the long-single loop with parenthesis grouping the terms by 4. You'd want to do that to every line of a's in the k = f lines, and you'd want to do that in the solution update part. You can re-use the same du in every line because it's only used for one part, and thus you only need a few cache variables around for the whole approximation.The result of that optimization is an utter mess. Refer to the DifferentialEquations.jl source code again. However timings show that even for small
y vectors you can get about 1/3 faster. Then you can change those cache arrays to StaticArrays to probably get major speed gains (but require v0.5).So the resulting code is no simpler, but it's much faster! Note that DifferentialEquations.jl has these hand-unrolled devecotorized implementations of also the order 12 and 14 methods if you'd like to take a look. It gets quite tedious to do, but sometimes you have to do it if you want the most optimal code.
Code Snippets
T(parse(BigFloat,"0.105352113571753019691496032887878162227673083080523884041670"))Context
StackExchange Code Review Q#83635, answer score: 5
Revisions (0)
No revisions yet.