patternModerate
From Guido's essays, how does this function avoid quadratic behavior in a string concatenation algorithm?
Viewed 0 times
thisguidoconcatenationbehaviorfunctionavoidalgorithmdoeshowfrom
Problem
I am reading one of Guido van Rossum's essays on optimization in Python.
We are interested in converting a Python list of integers to their character equivalents. Here's the straightforward algorithm.
Guido claims that the following algorithm avoids the quadratic behavior in the string concatenation.
How does this actually help and what is the significance of the value 16 in
Note: we can assume that we are working with a list of length exactly 256.
We are interested in converting a Python list of integers to their character equivalents. Here's the straightforward algorithm.
def f3(list):
string = ""
for character in map(chr, list):
string = string + character
return stringGuido claims that the following algorithm avoids the quadratic behavior in the string concatenation.
def f5(list):
string = ""
for i in range(0, 256, 16): # 0, 16, 32, 48, 64, ...
s = ""
for character in map(chr, list[i:i+16]):
s = s + character
string = string + s
return stringHow does this actually help and what is the significance of the value 16 in
f5? f5 does not actually turn out to be faster, but I want the intuition that went into trying this strategy of optimization. Note: we can assume that we are working with a list of length exactly 256.
Solution
Let's assume that adding two strings of lengths $a,b$ takes time $a+b$. Consider the following strategy to convert a list of $n$ characters into a list:
Read the list in chunks of $k$, convert them to strings, and sum the chunks.
Creating each chunk takes time $\Theta(1+2+\cdots+k) = \Theta(k^2)$, and we do this $n/k$ times, for a total of $\Theta(nk)$. Adding the chunks takes time $\Theta(k+2k+\cdots+n) = \Theta(k(1+2+\cdots+n/k)) = \Theta(k(n/k)^2) = \Theta(n^2/k)$. In total, we get a complexity of $\Theta(n(k+n/k))$. The optimal choice of $k$ is $\sqrt{n}$, and this gives a complexity of $\Theta(n^{3/2})$. In contrast, the trivial algorithm has $k = 1$ and so complexity $\Theta(n^2)$.
More generally, you can consider $d$ levels of recursion. Let $f_d(n)$ denote the optimal complexity of such a scheme, measuring just the merging steps. Clearly $f_1(n) \approx n^2/2$. Given an algorithm with $d$ levels, we can construct a $d+1$-level algorithm which cuts the input into chunks of length $k$, runs the $d$-level algorithm on each of them, and concatenates the result. This takes time roughly $(n/k)f_d(k) + n^2/2k$. For example,
$$
\begin{align*}
2f_2(n) &\approx \max_k nk + n^2/k \approx 2n^{3/2} & (k=\sqrt{n}), \\
2f_3(n) &\approx \max_k 2nk^{1/2} + n^2/k \approx 3n^{4/3} & (k=n^{2/3}), \\
2f_4(n) &\approx \max_k 3nk^{1/3} + n^2/k \approx 4n^{5/4} & (k=n^{3/4}).
\end{align*}
$$
We get that $2f_d(n) \approx dn^{(d+1)/d}$. Calculus shows that the best choice of $d$ is $d = \log n$, in which case we obtain $2f_{\log n}(n) = \Theta(n\log n)$. However, this scheme is not necessarily achievable.
If we choose a branching factor of 2 then we get an algorithm whose running time is $\Theta(n\log n)$ which resembles merge sort. In fact, the sorting lower bound gives a lower bound of $\Omega(n\log n)$ on any such merging procedure.
We can also prove the lower bound directly. For a sequence $\sigma_1,\ldots,\sigma_m$ of chunk lengths summing to $n$, define its potential by
$$
\Phi(\vec\sigma) = nH(X_{\vec\sigma}), \text{ where } \Pr[X_{\vec\sigma} = i] = \frac{\sigma_i}{n}.
$$
Here $H$ is the entropy function.
The potential function starts at $\Phi(1,\ldots,1) = n\log n$ and ends up at $\Phi(n) = 0$. Suppose that $\vec\tau$ is obtained from $\vec\sigma$ by merging $\sigma_i$ and $\sigma_j$ to a new chunk $\tau_k$. We can sample $X_{\vec\sigma}$ by first sampling $X_{\vec\tau}$, and if the result is $k$, sample according to the marginal distributions $\frac{\sigma_i}{\sigma_i+\sigma_j},\frac{\sigma_j}{\sigma_i+\sigma_j}$. Calculation shows that
$$
\begin{align*}
\Phi(\vec\sigma) = nH(\vec\sigma) &= nH(\vec\tau) + n\Pr[X_{\vec\tau}=k] h\left(\frac{\sigma_i}{\sigma_i+\sigma_j}\right) \\ &= \Phi(\vec\tau) + (\sigma_i+\sigma_j) h\left(\frac{\sigma_i}{\sigma_i+\sigma_j}\right) \\ &= \Phi(\vec\tau) + O(\sigma_i+\sigma_j).
\end{align*}
$$
Here $h$ is the binary entropy function.
This shows that the potential difference is a lower bound on the running time. Since the overall potential difference is $n\log n$, we obtain a lower bound of $\Omega(n\log n)$. Furthermore, the calculation shows that it's best to always merge chunks of identical size.
Read the list in chunks of $k$, convert them to strings, and sum the chunks.
Creating each chunk takes time $\Theta(1+2+\cdots+k) = \Theta(k^2)$, and we do this $n/k$ times, for a total of $\Theta(nk)$. Adding the chunks takes time $\Theta(k+2k+\cdots+n) = \Theta(k(1+2+\cdots+n/k)) = \Theta(k(n/k)^2) = \Theta(n^2/k)$. In total, we get a complexity of $\Theta(n(k+n/k))$. The optimal choice of $k$ is $\sqrt{n}$, and this gives a complexity of $\Theta(n^{3/2})$. In contrast, the trivial algorithm has $k = 1$ and so complexity $\Theta(n^2)$.
More generally, you can consider $d$ levels of recursion. Let $f_d(n)$ denote the optimal complexity of such a scheme, measuring just the merging steps. Clearly $f_1(n) \approx n^2/2$. Given an algorithm with $d$ levels, we can construct a $d+1$-level algorithm which cuts the input into chunks of length $k$, runs the $d$-level algorithm on each of them, and concatenates the result. This takes time roughly $(n/k)f_d(k) + n^2/2k$. For example,
$$
\begin{align*}
2f_2(n) &\approx \max_k nk + n^2/k \approx 2n^{3/2} & (k=\sqrt{n}), \\
2f_3(n) &\approx \max_k 2nk^{1/2} + n^2/k \approx 3n^{4/3} & (k=n^{2/3}), \\
2f_4(n) &\approx \max_k 3nk^{1/3} + n^2/k \approx 4n^{5/4} & (k=n^{3/4}).
\end{align*}
$$
We get that $2f_d(n) \approx dn^{(d+1)/d}$. Calculus shows that the best choice of $d$ is $d = \log n$, in which case we obtain $2f_{\log n}(n) = \Theta(n\log n)$. However, this scheme is not necessarily achievable.
If we choose a branching factor of 2 then we get an algorithm whose running time is $\Theta(n\log n)$ which resembles merge sort. In fact, the sorting lower bound gives a lower bound of $\Omega(n\log n)$ on any such merging procedure.
We can also prove the lower bound directly. For a sequence $\sigma_1,\ldots,\sigma_m$ of chunk lengths summing to $n$, define its potential by
$$
\Phi(\vec\sigma) = nH(X_{\vec\sigma}), \text{ where } \Pr[X_{\vec\sigma} = i] = \frac{\sigma_i}{n}.
$$
Here $H$ is the entropy function.
The potential function starts at $\Phi(1,\ldots,1) = n\log n$ and ends up at $\Phi(n) = 0$. Suppose that $\vec\tau$ is obtained from $\vec\sigma$ by merging $\sigma_i$ and $\sigma_j$ to a new chunk $\tau_k$. We can sample $X_{\vec\sigma}$ by first sampling $X_{\vec\tau}$, and if the result is $k$, sample according to the marginal distributions $\frac{\sigma_i}{\sigma_i+\sigma_j},\frac{\sigma_j}{\sigma_i+\sigma_j}$. Calculation shows that
$$
\begin{align*}
\Phi(\vec\sigma) = nH(\vec\sigma) &= nH(\vec\tau) + n\Pr[X_{\vec\tau}=k] h\left(\frac{\sigma_i}{\sigma_i+\sigma_j}\right) \\ &= \Phi(\vec\tau) + (\sigma_i+\sigma_j) h\left(\frac{\sigma_i}{\sigma_i+\sigma_j}\right) \\ &= \Phi(\vec\tau) + O(\sigma_i+\sigma_j).
\end{align*}
$$
Here $h$ is the binary entropy function.
This shows that the potential difference is a lower bound on the running time. Since the overall potential difference is $n\log n$, we obtain a lower bound of $\Omega(n\log n)$. Furthermore, the calculation shows that it's best to always merge chunks of identical size.
Context
StackExchange Computer Science Q#52360, answer score: 10
Revisions (0)
No revisions yet.