patterncMinor
Binary search optimization: Kernighan & Ritchie 3-1
Viewed 0 times
searchkernighanritchieoptimizationbinary
Problem
The problem I am trying to solve can be described as take
binarysearch0 and rewrite it as binarysearch1 such that binarysearch1 uses a single test inside the loop instead of two. The code I used to do this is seen below. Please let me know if you think I accomplished the task, and what I can do better.#include
#include
void printdata(int *v, int n)
{
int i = 0;
for (i = 0; i v[mid])
low = mid + 1;
else
return mid;
}
return -1;
}
int binarysearch1(int x, int *v, int n)
{
int low, high, mid;
low = 0;
high = n - 1;
while (low -1) {
printf("%d was found in the data: %14s", search, " ");
printdata(v[i], 10);
} else {
printf("%d was NOT found in the data: %10s", search, " ");
printdata(v[i], 10);
}
putchar('\n');
}
return 0;
}Solution
You're not really changing the number of comparisons inside the loop here: all you've done is move it from the loop body to the loop termination condition. Note that in many situations, comparisons on the array elements (which could be strings or more complex objects) are a lot costlier than comparisons on the indices.
You can find several solutions to this exercise on the clc-wiki. I think one of them (Paul Griffith's) misses the point (it's pretty much the same as yours); I'm going to show explain the reasoning behind Andrew Tesker and Colin Barker's solutions.
In the original function
In other cases, the only primitive you have is a less-than or less-or-equal-to, and to test for equality requires a second comparison. Thus the method presented in K&R requires two calls to the comparison function per iteration through the loop. It is more efficient to require a single comparison. Now we absolutely need to distinguish between the less-than and the greater-than case, so this means we're not going to be able to deal with the equality case inside the loop. So we'll keep going either up or down in the equality case. Since we haven't vetted the value
What happens to the loop termination? Now, we can get stuck with
Note that I set
At the exit of the loop, if there is a matching element in the array, then
The number of element comparisons performed by the original version is anywhere from 1 to 2*ceil(log₂(n)+1) where n is the number of elements in the array. The number of element comparisons in the version with a single comparison within the loop is always ceil(log₂(n))+2. That's a smaller maximum, but not necessarily a gain, because the original version will terminate earlier if it finds the element. The original version is preferable when there is a good chance that the element will be in the array. The modified version is preferable when the element is rarely present.
You can find several solutions to this exercise on the clc-wiki. I think one of them (Paul Griffith's) misses the point (it's pretty much the same as yours); I'm going to show explain the reasoning behind Andrew Tesker and Colin Barker's solutions.
In the original function
binsearch, inside the loop, there are three possible situations: xv[mid]. Sometimes, a comparison function has three outcomes: less than, equal or greater than; in such cases, the approach in the original function is best. For example, if we were comparing strings with strcmp:int cmp = strcmp(x, v[mid]);
if (cmp 0) low = mid + 1;
else return mid;In other cases, the only primitive you have is a less-than or less-or-equal-to, and to test for equality requires a second comparison. Thus the method presented in K&R requires two calls to the comparison function per iteration through the loop. It is more efficient to require a single comparison. Now we absolutely need to distinguish between the less-than and the greater-than case, so this means we're not going to be able to deal with the equality case inside the loop. So we'll keep going either up or down in the equality case. Since we haven't vetted the value
v[mid] in one of the cases, we need to keep it inside the search space.if (x < v[mid]) high = mid - 1;
else low = mid;What happens to the loop termination? Now, we can get stuck with
low == high and x >= v[low]. However, this is easily resolved by ending the loop when low == high.while (low < high) {
mid = (low + high + 1) / 2;
if (x < v[mid]) high = mid - 1;
else low = mid;
}Note that I set
mid to low + high + 1. In the case where low + 1 == high, (low + high) / 2 is low, and if x is still too small we'd set mid to low and loop forever (thanks to Elyasin for noticing that my original answer had this bug). Instead in this case we arrange for mid to be set to high. If low <= high we exit the loop anyway, and in other cases we'll continue looping and this change only causes the middle to be rounded up instead of down.At the exit of the loop, if there is a matching element in the array, then
v[low] is equal to it. So we perform a final equality test:return x == v[low] ? low : -1;The number of element comparisons performed by the original version is anywhere from 1 to 2*ceil(log₂(n)+1) where n is the number of elements in the array. The number of element comparisons in the version with a single comparison within the loop is always ceil(log₂(n))+2. That's a smaller maximum, but not necessarily a gain, because the original version will terminate earlier if it finds the element. The original version is preferable when there is a good chance that the element will be in the array. The modified version is preferable when the element is rarely present.
Code Snippets
int cmp = strcmp(x, v[mid]);
if (cmp < 0) high = mid - 1;
else if (cmp > 0) low = mid + 1;
else return mid;if (x < v[mid]) high = mid - 1;
else low = mid;while (low < high) {
mid = (low + high + 1) / 2;
if (x < v[mid]) high = mid - 1;
else low = mid;
}return x == v[low] ? low : -1;Context
StackExchange Code Review Q#6152, answer score: 8
Revisions (0)
No revisions yet.