principlecppMinor
Benchmarking inline vs normal functions in C++
Viewed 0 times
inlinefunctionsbenchmarkingnormal
Problem
I am currently learning C++ and have learnt about
How could this be improved?
inline functions and how they can give a performance benefit.How could this be improved?
#include
#include
#include
long fibonacci(unsigned n) {
if (n start, end;
// Start time.
start = std::chrono::system_clock::now();
//std::cout functionalTimeElapsed = end - start;
// Start time.
start = std::chrono::system_clock::now();
//std::cout inlineTimeElapsed = end - start;
if (functionalTimeElapsed > inlineTimeElapsed) {
std::cout << "n = " << n << "\tFunctional method was faster by " << ((functionalTimeElapsed - inlineTimeElapsed) / inlineTimeElapsed) * 100 << "%\n";
} else if (functionalTimeElapsed < inlineTimeElapsed) {
std::cout << "n = " << n << "\tInline method was faster by " << ((inlineTimeElapsed - functionalTimeElapsed) / functionalTimeElapsed) * 100 << "%\n";
} else {
std::cout << "n = " << n << "\tFunctional method and inline method took the same time.\n";
}
}
int main() {
for (int i = 5; i <=40 ; i++) {
timedFibonacci(i);
}
cin.get();
}Solution
Inlining
First of all, your
To stand a chance of getting some good out of the
In general, recursive functions are a problem for inlining in any case, so many compilers completely disable inline code generation for recursive functions.
Add in the fact that most reasonably current compilers basically ignore the
Timing
I'd do the timing somewhat differently. The single responsibility principle applies here, just like everything else. That means the timer should deal only with timing. I usually use something on this general order:
Arguably this already does more than it should (printing out results in addition to the actual timing) but at least it's quite a bit closer to a single responsibility.
With that, we invoke each Fibonacci generator separately, something on this order:
The result will then look something like this:
Just a minor note: this is a somewhat more general timing function that most people usually need. In particular, it doesn't require the function being timed to take a specific number of arguments--the number and types of arguments you pass after the
Efficiency
Ignoring, for the moment, the issue of inlining (which is probably pretty much a red herring anyway), I'd consider other ways of computing Fibonacci numbers. In the absence of memoization, a recursive function is horribly inefficient. An iterative function is many times faster. That can be written with a
This is somewhat longer, but the difference in speed is...fairly substantial. Computing fib(43) with each, I get timings like this:
I had to switch to showing the time in nanoseconds to get a non-zero time for the iterative solution. Even so, for fib(43) and highest resolution-timing I can get, it still shows a time of
Final Code
For what it's worth, here's the code I ran to get the timing comparison:
First of all, your
inline function calls the non-inline function, so if the compiler is really paying attention to the inline specification, it's only affecting a single "layer" of invocation.inline long fib(unsigned n) {
if (n < 2) return n;
return fibonacci(n - 1) + fibonacci(n - 2);
}To stand a chance of getting some good out of the
inline specification, you probably want to have this call itself instead of the other function:inline long fib(unsigned n) {
if (n < 2) return n;
return fib(n - 1) + fib(n - 2);
}In general, recursive functions are a problem for inlining in any case, so many compilers completely disable inline code generation for recursive functions.
Add in the fact that most reasonably current compilers basically ignore the
inline specifier anyway, and you get a high likelihood that you won't see any difference between these functions at all.Timing
I'd do the timing somewhat differently. The single responsibility principle applies here, just like everything else. That means the timer should deal only with timing. I usually use something on this general order:
template
auto timer(F f, std::string const &label, Args && ...args) {
using namespace std::chrono;
auto start = high_resolution_clock::now();
auto holder = f(std::forward(args)...);
auto stop = high_resolution_clock::now();
std::cout (stop - start).count() << "\n";
return holder;
}Arguably this already does more than it should (printing out results in addition to the actual timing) but at least it's quite a bit closer to a single responsibility.
With that, we invoke each Fibonacci generator separately, something on this order:
auto foo = timer(fib, "inline", max);The result will then look something like this:
inline time: 1062Just a minor note: this is a somewhat more general timing function that most people usually need. In particular, it doesn't require the function being timed to take a specific number of arguments--the number and types of arguments you pass after the
label have to match those needed by the function you're timing, but as far as the timer itself cares, it's essentially wide open.Efficiency
Ignoring, for the moment, the issue of inlining (which is probably pretty much a red herring anyway), I'd consider other ways of computing Fibonacci numbers. In the absence of memoization, a recursive function is horribly inefficient. An iterative function is many times faster. That can be written with a
for loop, something on this general order:unsigned long long fib(unsigned limit) {
unsigned long long a = 1;
unsigned long long b = 1;
unsigned long long c;
if (limit < 2)
return 1;
for (auto i=1ULL; i<limit; i++) {
c = a + b;
a = b;
b = c;
}
return c;
}This is somewhat longer, but the difference in speed is...fairly substantial. Computing fib(43) with each, I get timings like this:
out of line time: 2807407170
inline time: 2794785991
iterative time: 321
Sum (ignore): 1568397607I had to switch to showing the time in nanoseconds to get a non-zero time for the iterative solution. Even so, for fib(43) and highest resolution-timing I can get, it still shows a time of
0 about one run out of every 3 or 4. The recursive versions are approximately 7 orders of magnitude slower (and although inline code generation did seem to help a tiny bit, it's still minuscule compared to the improvement from a better algorithm. Oh, and for what it's worth, it looks like the inline code generation really did make a difference. It is pretty small, but at least in my testing, the inline version does finish just a tiny bit faster every time.Final Code
For what it's worth, here's the code I ran to get the timing comparison:
#include
#include
#include
template
auto timer(F f, std::string const &label, Args && ...args) {
using namespace std::chrono;
auto start = high_resolution_clock::now();
auto holder = f(std::forward(args)...);
auto stop = high_resolution_clock::now();
std::cout (stop - start).count() << "\n";
return holder;
}
long fibonacci(unsigned n) {
if (n < 2) return n;
return fibonacci(n - 1) + fibonacci(n - 2);
}
inline long fib(unsigned n) {
if (n < 2) return n;
return fib(n - 1) + fib(n - 2);
}
unsigned long long iter_fib(unsigned limit) {
unsigned long long a = 1;
unsigned long long b = 1;
unsigned long long c;
if (limit < 2)
return 1;
for (auto i = 1ULL; i < limit; i++) {
c = a + b;
a = b;
b = c;
}
return c;
}
int main() {
static const int max = 43;
auto a = timer(fibonacci, "out of line", max);
auto b = timer(fib, " inline", max);
auto c = timer(iter_fib, " iterative", max);
std::cout << "Sum (ignore): " << a + b + c;
}Code Snippets
inline long fib(unsigned n) {
if (n < 2) return n;
return fibonacci(n - 1) + fibonacci(n - 2);
}inline long fib(unsigned n) {
if (n < 2) return n;
return fib(n - 1) + fib(n - 2);
}template <typename F, typename ...Args>
auto timer(F f, std::string const &label, Args && ...args) {
using namespace std::chrono;
auto start = high_resolution_clock::now();
auto holder = f(std::forward<Args>(args)...);
auto stop = high_resolution_clock::now();
std::cout << label << " time: " << duration_cast<microseconds>(stop - start).count() << "\n";
return holder;
}auto foo = timer(fib, "inline", max);inline time: 1062Context
StackExchange Code Review Q#143586, answer score: 7
Revisions (0)
No revisions yet.