patternModerate

From Guido's essays, how does this function avoid quadratic behavior in a string concatenation algorithm?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

arrays loops cs strings algorithms stackoverflow efficiency

thisguidoconcatenationbehaviorfunctionavoidalgorithmdoeshowfrom

Problem

I am reading one of Guido van Rossum's essays on optimization in Python.

We are interested in converting a Python list of integers to their character equivalents. Here's the straightforward algorithm.

def f3(list):
    string = ""
    for character in map(chr, list):
        string = string + character
    return string

Guido claims that the following algorithm avoids the quadratic behavior in the string concatenation.

def f5(list):
    string = ""
    for i in range(0, 256, 16): # 0, 16, 32, 48, 64, ...
        s = ""
        for character in map(chr, list[i:i+16]):
            s = s + character
        string = string + s
    return string

How does this actually help and what is the significance of the value 16 in f5?

f5 does not actually turn out to be faster, but I want the intuition that went into trying this strategy of optimization.

Note: we can assume that we are working with a list of length exactly 256.

Solution

Let's assume that adding two strings of lengths $a,b$ takes time $a+b$. Consider the following strategy to convert a list of $n$ characters into a list:

Read the list in chunks of $k$, convert them to strings, and sum the chunks.

Creating each chunk takes time $\Theta(1+2+\cdots+k) = \Theta(k^2)$, and we do this $n/k$ times, for a total of $\Theta(nk)$. Adding the chunks takes time $\Theta(k+2k+\cdots+n) = \Theta(k(1+2+\cdots+n/k)) = \Theta(k(n/k)^2) = \Theta(n^2/k)$. In total, we get a complexity of $\Theta(n(k+n/k))$. The optimal choice of $k$ is $\sqrt{n}$, and this gives a complexity of $\Theta(n^{3/2})$. In contrast, the trivial algorithm has $k = 1$ and so complexity $\Theta(n^2)$.

More generally, you can consider $d$ levels of recursion. Let $f_d(n)$ denote the optimal complexity of such a scheme, measuring just the merging steps. Clearly $f_1(n) \approx n^2/2$. Given an algorithm with $d$ levels, we can construct a $d+1$-level algorithm which cuts the input into chunks of length $k$, runs the $d$-level algorithm on each of them, and concatenates the result. This takes time roughly $(n/k)f_d(k) + n^2/2k$. For example,
$$
\begin{align*}
2f_2(n) &\approx \max_k nk + n^2/k \approx 2n^{3/2} & (k=\sqrt{n}), \\
2f_3(n) &\approx \max_k 2nk^{1/2} + n^2/k \approx 3n^{4/3} & (k=n^{2/3}), \\
2f_4(n) &\approx \max_k 3nk^{1/3} + n^2/k \approx 4n^{5/4} & (k=n^{3/4}).
\end{align*}
$$
We get that $2f_d(n) \approx dn^{(d+1)/d}$. Calculus shows that the best choice of $d$ is $d = \log n$, in which case we obtain $2f_{\log n}(n) = \Theta(n\log n)$. However, this scheme is not necessarily achievable.

If we choose a branching factor of 2 then we get an algorithm whose running time is $\Theta(n\log n)$ which resembles merge sort. In fact, the sorting lower bound gives a lower bound of $\Omega(n\log n)$ on any such merging procedure.

We can also prove the lower bound directly. For a sequence $\sigma_1,\ldots,\sigma_m$ of chunk lengths summing to $n$, define its potential by
$$
\Phi(\vec\sigma) = nH(X_{\vec\sigma}), \text{ where } \Pr[X_{\vec\sigma} = i] = \frac{\sigma_i}{n}.
$$
Here $H$ is the entropy function.

The potential function starts at $\Phi(1,\ldots,1) = n\log n$ and ends up at $\Phi(n) = 0$. Suppose that $\vec\tau$ is obtained from $\vec\sigma$ by merging $\sigma_i$ and $\sigma_j$ to a new chunk $\tau_k$. We can sample $X_{\vec\sigma}$ by first sampling $X_{\vec\tau}$, and if the result is $k$, sample according to the marginal distributions $\frac{\sigma_i}{\sigma_i+\sigma_j},\frac{\sigma_j}{\sigma_i+\sigma_j}$. Calculation shows that
$$
\begin{align*}
\Phi(\vec\sigma) = nH(\vec\sigma) &= nH(\vec\tau) + n\Pr[X_{\vec\tau}=k] h\left(\frac{\sigma_i}{\sigma_i+\sigma_j}\right) \\ &= \Phi(\vec\tau) + (\sigma_i+\sigma_j) h\left(\frac{\sigma_i}{\sigma_i+\sigma_j}\right) \\ &= \Phi(\vec\tau) + O(\sigma_i+\sigma_j).
\end{align*}
$$
Here $h$ is the binary entropy function.

This shows that the potential difference is a lower bound on the running time. Since the overall potential difference is $n\log n$, we obtain a lower bound of $\Omega(n\log n)$. Furthermore, the calculation shows that it's best to always merge chunks of identical size.

Context

StackExchange Computer Science Q#52360, answer score: 10

Revisions (0)

No revisions yet.