patternjavascriptModerate
Efficient Binary Search
Viewed 0 times
efficientbinarysearch
Problem
My implementation:
Normal Implementation:
The difference being my implementation makes a guess at the index of the value based on the values at the start and end positions instead of just going straight to the middle value each time.
I wondered if anyone could think of any case scenarios where this would be slower than the original implementation.
UPDATE:
Sorry for the bad examples. I have now made them a little easier to understand and have setup some tests on jsPerf. See here:
http://jsperf.com/binary-search-2
I'm seeing about 75% improvement by using my method.
Array.prototype.binarySearchFast = function(search) {
var size = this.length,
high = size -1,
low = 0;
while (high > low) {
if (this[low] === search) return low;
else if (this[high] === search) return high;
target = (((search - this[low]) / (this[high] - this[low])) * (high - low)) >>> 0;
if (this[target] === search) return target;
else if (search > this[target]) low = target + 1, high--;
else high = target - 1, low++;
}
return -1;
};Normal Implementation:
Array.prototype.binarySearch = function(find) {
var low = 0, high = this.length - 1,
i, comparison;
while (low find) { high = i - 1; continue; };
return i;
}
return null;
};The difference being my implementation makes a guess at the index of the value based on the values at the start and end positions instead of just going straight to the middle value each time.
I wondered if anyone could think of any case scenarios where this would be slower than the original implementation.
UPDATE:
Sorry for the bad examples. I have now made them a little easier to understand and have setup some tests on jsPerf. See here:
http://jsperf.com/binary-search-2
I'm seeing about 75% improvement by using my method.
Solution
My advice is to not mess about with something that's been well and truly tested :-)
No, not really: if you find an algorithm that's better then, by all means, use it. However, in this case, for general data, this isn't going to be an improvement.
The power of binary search, and other
Any change to the "midpoint" (the divider between what you're keeping in the search space and what you're disposing of) that you select during an iteration has the potential to improve or degrade the performance. For example, putting the midpoint at 25% has the potential to reduce the search space even faster (if you're right) or slower (if you're wrong).
Now, if you know of some property of the data, you can use that to your advantage to improve an algorithm. In fact, it's the "extra" knowledge about your list (the fact that it's sorted) that allows you to optimise what would normally be a sequential search into a binary one.
So it all comes down to how good your extra information is. In this case, the values of the two end nodes alone are not really indicative of where the midpoint should be. You only have to look at the list:
to see this in action.
If you were looking for
Similarly, if you were looking for
Of course, that's not to say that other extra information would not help. If you knew that the data was fairly evenly distributed over the range, then your improvement may be worth it.
You also need to be aware that it's usually only going to be important with large data inputs, so it may not necessarily be worth it even if it turns out to be much better. Even bubble sort is blindingly fast for 100 elements :-)
No, not really: if you find an algorithm that's better then, by all means, use it. However, in this case, for general data, this isn't going to be an improvement.
The power of binary search, and other
O(log N) type algorithms, lies in the fact that you dispose of half the remaining search space with each iteration. In other words, if the initial search space (array size) was 1000, the first iteration removes 500 of them.Any change to the "midpoint" (the divider between what you're keeping in the search space and what you're disposing of) that you select during an iteration has the potential to improve or degrade the performance. For example, putting the midpoint at 25% has the potential to reduce the search space even faster (if you're right) or slower (if you're wrong).
Now, if you know of some property of the data, you can use that to your advantage to improve an algorithm. In fact, it's the "extra" knowledge about your list (the fact that it's sorted) that allows you to optimise what would normally be a sequential search into a binary one.
So it all comes down to how good your extra information is. In this case, the values of the two end nodes alone are not really indicative of where the midpoint should be. You only have to look at the list:
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 500, 1000]to see this in action.
If you were looking for
500 in that list, you might decide that, based on the first and last elements being 1 and 1000, it would be in the middle somewhere, which is clearly not the case.Similarly, if you were looking for
14, you might first check elements around the 1.4% mark (14/1000) which would probably be the first element, despite the fact it's right up at the other end.Of course, that's not to say that other extra information would not help. If you knew that the data was fairly evenly distributed over the range, then your improvement may be worth it.
You also need to be aware that it's usually only going to be important with large data inputs, so it may not necessarily be worth it even if it turns out to be much better. Even bubble sort is blindingly fast for 100 elements :-)
Code Snippets
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 500, 1000]Context
StackExchange Code Review Q#5363, answer score: 16
Revisions (0)
No revisions yet.