principleMajor
Why do we use the number of compares to measure the time complexity when compare is quite cheap?
Viewed 0 times
cheapwhynumberthemeasurecomparestimequitecomparewhen
Problem
I think one reason a compare is regarded as quite costly is due to the historical research as remarked by Knuth, that it came from tennis match trying to find the second or third best tennis player correctly, assuming the tennis players are not a "rock paper scissors" situation (but has an "absolute 'combat' power").
If we have an array of size
With sorting or selection algorithms, for example, what if the number of comparisons can be O(n log n) or O(n), but then, other operations had to be O(n²) or O(n log n), then wouldn't the higher O() still override the number of comparisons? (Maybe it didn't happen yet, or else we would have a study case about this situation). So ultimately, shouldn't the number of atomic steps, instead of comparisons, measured by the order of growth as compared to
If we have an array of size
n being 1,000,000, we don't usually mind comparing 2,000,000 times to find the second largest number. With the tennis tournament, having a Player A match with Player B can be costly as it can take a whole afternoon.With sorting or selection algorithms, for example, what if the number of comparisons can be O(n log n) or O(n), but then, other operations had to be O(n²) or O(n log n), then wouldn't the higher O() still override the number of comparisons? (Maybe it didn't happen yet, or else we would have a study case about this situation). So ultimately, shouldn't the number of atomic steps, instead of comparisons, measured by the order of growth as compared to
n (O()) that determines the time complexity?Solution
Sure. But in practice that is rare: the sorting algorithms we usually use or analyze in practice do at most a constant number of other operations per comparison, so this isn't an issue for the sorting algorithms we actually care about. This means that measuring the number of comparisons or the number of steps taken gives you the same asymptotic running time.
Also, there are some situations where we use sorting algorithms where each comparison is much slower than the other operations. A comparison may be slower than the other operations, because conditional branches can be very slow on modern processors (each comparison may have a significant chance to cause pipeline flushes). And if you are sorting complex objects with a custom comparison function, then each comparison might take many instructions. So, in at least some cases, the number of comparisons might well dominate the over time it takes to sort the input.
Finally, there are certainly some cases we do take into account the time to perform other operations. It's most common to only count the number of comparisons, but by no means universal. In situations where the time taken for other operations is important, people do analyze the total running time.
Ultimately, asymptotic running time analysis is only a theoretical model. It is a simplified model that omits many considerations. Any such simplification will necessarily ignore many factors. As long as those factors are unimportant, this can be useful, as it helps you simplify the problem enough to analyze it and gain insight. But, as always, if one of those factors turns out to have a significant effect and you didn't include it in your model, then the model will yield misleading results -- this is true of all models, and is not limited to sorting or algorithmic analysis. Part of the art of modelling is identifying which factors dominate and which factors are second-order, so that you can choose a model that's only as complex as needed -- it is as simple as possible, so you can understand it better, but no simpler than that, so it still yields predictions that are reasonably representative of the real world.
Also, there are some situations where we use sorting algorithms where each comparison is much slower than the other operations. A comparison may be slower than the other operations, because conditional branches can be very slow on modern processors (each comparison may have a significant chance to cause pipeline flushes). And if you are sorting complex objects with a custom comparison function, then each comparison might take many instructions. So, in at least some cases, the number of comparisons might well dominate the over time it takes to sort the input.
Finally, there are certainly some cases we do take into account the time to perform other operations. It's most common to only count the number of comparisons, but by no means universal. In situations where the time taken for other operations is important, people do analyze the total running time.
Ultimately, asymptotic running time analysis is only a theoretical model. It is a simplified model that omits many considerations. Any such simplification will necessarily ignore many factors. As long as those factors are unimportant, this can be useful, as it helps you simplify the problem enough to analyze it and gain insight. But, as always, if one of those factors turns out to have a significant effect and you didn't include it in your model, then the model will yield misleading results -- this is true of all models, and is not limited to sorting or algorithmic analysis. Part of the art of modelling is identifying which factors dominate and which factors are second-order, so that you can choose a model that's only as complex as needed -- it is as simple as possible, so you can understand it better, but no simpler than that, so it still yields predictions that are reasonably representative of the real world.
Context
StackExchange Computer Science Q#122010, answer score: 22
Revisions (0)
No revisions yet.