patternMinor

What is the complexity of a variation of the Coupon collector's problem?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

time-complexity cs probability-theory stackoverflow

problemthecollectorwhatcouponvariationcomplexity

Problem

I need to know the complexity of the following algorithm:

Draw elements from a set of size $m$, one by one, randomly, with replacement, until coming across $n$ different elements from the set ($n\le m$). We do not care what are the numbers drawn, and halt once $n$ different numbers were seen during this process.

This is a variation of the Coupon collector's problem, where one continues picking until seeing all $m$ elements of the set.

Example:

We draw numbers from the set $\{1,2,3,4,5\}$, $n=3$, $m=5$. The output
$\{1,5,2\}$ is produced by the following series of draws:

$1,1,5,1,5,2$

from the pool $\{1,2,3,4,5\}$. In this case, we halt after $6$ operations.

What is the expected time complexity of this algorithm?

Solution

When $m$ is much larger than $n$, the expected number of trials is basically linear in $n$. We can make this more precise, as shown below.

Let $T_n$ be the random variable which counts the number of trials up to seeing $n$ different elements, where the elements are picked uniformly at random from a set of size $m$.

You can write $T_n$ as the sum of geometric random variables

$$T_n=1+G\left(1-\frac{1}{m}\right)+G\left(1-\frac{2}{m}\right)+...+G\left(1-\frac{n-1}{m}\right)$$

where $G(p)$ is a geometric random variable with parameter $p$ (number of trails up to the first success, where success happens with probability $p$ (I leave it to you to prove the equivalence).

Using linearity of expectation:

$$\begin{align*}
\mathbb{E}\left[T_n\right]&=\sum\limits_{i=0}^{n-1}\mathbb{E}\left[G\left(1-\frac{i}{m}\right)\right]\\
&=\sum\limits_{i=0}^{n-1}\frac{m}{m-i}\\
&=m\left(\frac{1}{m}+\frac{1}{m-1}+...+\frac{1}{m-n+1}\right)\\
&=m\left(H_m-H_{m-n}\right)
\end{align*}$$

where $H_k$ is the harmonic sum, which satisfies $\frac{1}{2(k+1)}\le H_k-\ln k - \gamma\le \frac{1}{2k}$, where $\gamma$ is the Euler–Mascheroni constant (see Wolfram's page on harmonic numbers). Using these bounds for harmonic numbers we have:

$$\begin{align*}
\mathbb{E}\left[T_n\right]&\le m\left(\frac{1}{2m}+\ln m +\gamma -\left(\frac{1}{2(m-n+1)}+\ln(m-n)+\gamma\right)\right)\\
&\le m\left(\frac{1}{2m}+\ln\frac{m}{m-n}\right)=\frac{1}{2}+m\ln\frac{m}{m-n}.
\end{align*}$$

Note that $m\ln\frac{m}{m-n}=m\ln\left(1+\frac{n}{m-n}\right)\approx n$ when $n\ll m$, so for $m$ much larger than $n$, the expected number of trials is linear in $n$.

Context

StackExchange Computer Science Q#66236, answer score: 6

Revisions (0)

No revisions yet.