patternMinor
Where/how did a $\log(n)$ factor disappear from well-known algorithms?
Viewed 0 times
howlogdidwheredisappearalgorithmswellknownfactorfrom
Problem
Consider the binary search problem on a sorted array containing $n$ integers on
16 bits. Everybody agrees that the binary search needs $O(\log(n))$ time,
because it makes at worst $O(\log(n))$ steps. But: at each step it needs to
calculate a midpoint. The first midpoint is $[n/2]$. The next midpoint could
be $[n/2]+[n/4]$. If the searched number is placed in the second half of the
array, the midpoint at each step is no less than $[n/2]$. Each calculation of
a new midpoint involves at least a number with $O(\log(n))$ bits, hence each
midpoint calculation requires $O(\log(n))$. I obtain a more precise complexity
of $O(\log(n)^2)$. Where is the error?
The same problem could appear on countless algorithms that use arrays or
matrices. Take the most well-known dynamic programming algorithm for the
knapsack problem. Everybody agrees it takes $O(nW)$ time (e.g., the
wikipedia article on the knapsack problem), where $n$ is the number of items and
$W$ is the capacity. But at each step, it needs to compute a difference of
weights/capacities, which should account for an additional factor of
$O(\log(W))$. Where is the error?
Where/how does this $O(\log(n))$ complexity factor disappear? If we had needed
it, it would have appeared in many algorithms.
ps. A very similar problem arises in the questions asked here and here, thanks Ariel, Raphael and ryan for pointing this out.
16 bits. Everybody agrees that the binary search needs $O(\log(n))$ time,
because it makes at worst $O(\log(n))$ steps. But: at each step it needs to
calculate a midpoint. The first midpoint is $[n/2]$. The next midpoint could
be $[n/2]+[n/4]$. If the searched number is placed in the second half of the
array, the midpoint at each step is no less than $[n/2]$. Each calculation of
a new midpoint involves at least a number with $O(\log(n))$ bits, hence each
midpoint calculation requires $O(\log(n))$. I obtain a more precise complexity
of $O(\log(n)^2)$. Where is the error?
The same problem could appear on countless algorithms that use arrays or
matrices. Take the most well-known dynamic programming algorithm for the
knapsack problem. Everybody agrees it takes $O(nW)$ time (e.g., the
wikipedia article on the knapsack problem), where $n$ is the number of items and
$W$ is the capacity. But at each step, it needs to compute a difference of
weights/capacities, which should account for an additional factor of
$O(\log(W))$. Where is the error?
Where/how does this $O(\log(n))$ complexity factor disappear? If we had needed
it, it would have appeared in many algorithms.
ps. A very similar problem arises in the questions asked here and here, thanks Ariel, Raphael and ryan for pointing this out.
Solution
You're not wrong, you're just using a different cost model.
Typically there are two:
-
Uniform cost model - assigns a constant cost to every machine operation regardless of size of numbers.
-
Logarithmic cost model - assigns a cost proportional to the number of bits involved for each machine operation.
Under uniform cost, a binary search takes constant operations to find the next index to search at, we get a recurrence relations like so:
$$\begin{align}
T(1) & = O(1)\\[0.5em]
T(n) & = T\left( \left\lfloor \frac{n}{2} \right\rfloor \right) + O(1)\\[0.5em]
& = \Theta(\log n)
\end{align}$$
Under logarithmic cost, we can assume it takes $f(n)$ operations to find the next midpoint and we get a recurrence like so:
$$\begin{align}
T(1) & = O(1)\\[0.5em]
T(n) & = T\left( \left\lfloor \frac{n}{2} \right\rfloor \right) + f(n)\\[0.5em]
\end{align}$$
For binary search, typically $f(n)$ will be no more than computing $\lfloor \frac{l + r}{2} \rfloor$ which would take $\log_2 n$ operations for the addition, and to divide by $2$ we could do a bit shift taking $\log_2 n$ operations (see here). Which brings us to:
$$\begin{align}
T(n) & = T\left( \left\lfloor \frac{n}{2} \right\rfloor \right) + \Theta(\log n)\\[0.5em]
& = \Theta(\log^2 n)
\end{align}$$
These logarithmic costs are just for addition and shifting though. If you were to do other $n$-bit operations then $f(n)$ could change. For example, $n$-bit multiplication and exponentiation could be larger than $\Omega(n)$ and you would have to adjust you complexity accordingly under logarithmic cost.
Typically there are two:
-
Uniform cost model - assigns a constant cost to every machine operation regardless of size of numbers.
-
Logarithmic cost model - assigns a cost proportional to the number of bits involved for each machine operation.
Under uniform cost, a binary search takes constant operations to find the next index to search at, we get a recurrence relations like so:
$$\begin{align}
T(1) & = O(1)\\[0.5em]
T(n) & = T\left( \left\lfloor \frac{n}{2} \right\rfloor \right) + O(1)\\[0.5em]
& = \Theta(\log n)
\end{align}$$
Under logarithmic cost, we can assume it takes $f(n)$ operations to find the next midpoint and we get a recurrence like so:
$$\begin{align}
T(1) & = O(1)\\[0.5em]
T(n) & = T\left( \left\lfloor \frac{n}{2} \right\rfloor \right) + f(n)\\[0.5em]
\end{align}$$
For binary search, typically $f(n)$ will be no more than computing $\lfloor \frac{l + r}{2} \rfloor$ which would take $\log_2 n$ operations for the addition, and to divide by $2$ we could do a bit shift taking $\log_2 n$ operations (see here). Which brings us to:
$$\begin{align}
T(n) & = T\left( \left\lfloor \frac{n}{2} \right\rfloor \right) + \Theta(\log n)\\[0.5em]
& = \Theta(\log^2 n)
\end{align}$$
These logarithmic costs are just for addition and shifting though. If you were to do other $n$-bit operations then $f(n)$ could change. For example, $n$-bit multiplication and exponentiation could be larger than $\Omega(n)$ and you would have to adjust you complexity accordingly under logarithmic cost.
Context
StackExchange Computer Science Q#79213, answer score: 9
Revisions (0)
No revisions yet.