snippetcppMinor
How to speed up interval lookup for a piecewise defined function?
Viewed 0 times
piecewiseintervalfunctionforhowlookupspeeddefined
Problem
At the innermost loop of my software is a lookup of a function value from a piecewise defined function with interpolation between the sample points.
Illustration:
There are several different interpolation methods in between the sample points.
My current implementation consists of two arrays, one for the interval endpoints and one for the function values. Searching is done via std::upper_bound if the array is large and find_if if the arrays are small. Based on the access patterns the speed improved tremendously by adding a cache value for the previous positions interval and checking if it is still valid for the new position
The lookup currently still takes about 5% of total execution time, which seems too much.
I have indicated costs of the most expensive lines which is the comparison with the cached position [1%] and the linear search [3%].
Is there a faster way to lookup the interval which contains X?
Edit
Edit
It seems the std::find_if implemenation is actually faster than binary search after all.
I have made a test here: http://ideone.com/Fo5ZeW
```
#include
#include
#include
using namespace std;
double binary_search8(double arr, const double& x)
{
if (x test_vals;
size_t num_elem = 10000000;
for (size_t i=0; i(), test_vals[i]))-1);
}
end = clock();
elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
cout << "linear secs: " <<elapsed_secs << ", testval = " << linavg << std::endl;
// Binary search
double binavg = 0
Illustration:
* *
* *
* *
|--|------|------|--------X---------|-------------|There are several different interpolation methods in between the sample points.
My current implementation consists of two arrays, one for the interval endpoints and one for the function values. Searching is done via std::upper_bound if the array is large and find_if if the arrays are small. Based on the access patterns the speed improved tremendously by adding a cache value for the previous positions interval and checking if it is still valid for the new position
[1%] if (x >= *cache && x *end) {
outside = true;
} else {
outside = false;
if (n_ (), x)) - 1;
} else {
position = std::upper_bound(begin, end, x) - 1;
}
cache = position ;
}The lookup currently still takes about 5% of total execution time, which seems too much.
I have indicated costs of the most expensive lines which is the comparison with the cached position [1%] and the linear search [3%].
Is there a faster way to lookup the interval which contains X?
Edit
- I have included the switch after profiling since Linear search can faster for small arrays (and there are a lot of small arrays).
Edit
It seems the std::find_if implemenation is actually faster than binary search after all.
I have made a test here: http://ideone.com/Fo5ZeW
```
#include
#include
#include
using namespace std;
double binary_search8(double arr, const double& x)
{
if (x test_vals;
size_t num_elem = 10000000;
for (size_t i=0; i(), test_vals[i]))-1);
}
end = clock();
elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
cout << "linear secs: " <<elapsed_secs << ", testval = " << linavg << std::endl;
// Binary search
double binavg = 0
Solution
It'll take some testing to be entirely certainly, but a few possibilities to consider:
I'd try replacing this with:
[Technically,
This basically replaces two comparisons (that must be executed serially, thanks to
I'd at least try just assigning the result of the comparison:
Again, you can try the same "trick" as above, and get something like:
As before, this replaces two relatively unpredictable comparisons with one relatively predictable one.
If you're spending significantly more time in the linear search than the binary search, I'd try reducing the cut-off between the two, perhaps to 32 instead of 64.
As with any micro-optimization, none of these is really guaranteed to be effective. It's even possible your compiler may be incorporating one or more of them into the code it generates, so they'll do no good at all, or even that its optimizer won't recognize the resulting patterns in the code as well, so they could actually hurt performance. The only real way to find out is some testing and profiling.
if (x >= *cache && x < *(cache + 1)) {I'd try replacing this with:
if (static_cast(x-*cache) (*(cache+1) - *cache)) {[Technically,
unsigned is a place-holder here. It should really be something like std::make_unsigned::type. That is, if the input is (say) long long, you want unsigned long long, not just unsigned int.]This basically replaces two comparisons (that must be executed serially, thanks to
&& imposing a sequence point) with one plus a little extra math. The math can typically be done in parallel, and the branch becomes more predictable as well.if (x *end) {
outside = true;
} else {
outside = false;I'd at least try just assigning the result of the comparison:
outside = x *end;Again, you can try the same "trick" as above, and get something like:
outside = static_cast(x-*begin) (*end-*begin);As before, this replaces two relatively unpredictable comparisons with one relatively predictable one.
if (n_ (), x)) - 1;
} else {
position = std::upper_bound(begin, end, x) - 1;
}If you're spending significantly more time in the linear search than the binary search, I'd try reducing the cut-off between the two, perhaps to 32 instead of 64.
As with any micro-optimization, none of these is really guaranteed to be effective. It's even possible your compiler may be incorporating one or more of them into the code it generates, so they'll do no good at all, or even that its optimizer won't recognize the resulting patterns in the code as well, so they could actually hurt performance. The only real way to find out is some testing and profiling.
Code Snippets
if (x >= *cache && x < *(cache + 1)) {if (static_cast<unsigned>(x-*cache) < static_cast<unsigned>(*(cache+1) - *cache)) {if (x < *begin || x > *end) {
outside = true;
} else {
outside = false;outside = x < *begin || x > *end;outside = static_cast<unsigned>(x-*begin) <= static_cast<unsigned>(*end-*begin);Context
StackExchange Code Review Q#43920, answer score: 2
Revisions (0)
No revisions yet.