patternMinor
Time-space tradeoff for missing element problem
Viewed 0 times
problemspacetimetradeoffforelementmissing
Problem
Here is a well-known problem.
Given an array $A[1\dots n]$ of positive integers, output the smallest positive integer not in the array.
The problem can be solved in $O(n)$ space and time: read the array, keep track in $O(n)$ space whether $1,2,\dots,n+1$ occured, scan for smallest element.
I noticed you can trade space for time. If you have $O(\frac{n}{k})$ memory only, you can do it in $k$ rounds and get time $O(k n)$. In a special case, there is obviously constant-space quadratic-time algorithm.
My question is:
Is this the optimal tradeoff, i.e. does $\operatorname{time} \cdot \operatorname{space} = \Omega(n^2)$?
In general, how does one prove such type of bounds?
Assume RAM model, with bounded arithmetic and random access to arrays in O(1).
Inspiration for this problem: time-space tradeoff for palindromes in one-tape model (see for example, here).
Given an array $A[1\dots n]$ of positive integers, output the smallest positive integer not in the array.
The problem can be solved in $O(n)$ space and time: read the array, keep track in $O(n)$ space whether $1,2,\dots,n+1$ occured, scan for smallest element.
I noticed you can trade space for time. If you have $O(\frac{n}{k})$ memory only, you can do it in $k$ rounds and get time $O(k n)$. In a special case, there is obviously constant-space quadratic-time algorithm.
My question is:
Is this the optimal tradeoff, i.e. does $\operatorname{time} \cdot \operatorname{space} = \Omega(n^2)$?
In general, how does one prove such type of bounds?
Assume RAM model, with bounded arithmetic and random access to arrays in O(1).
Inspiration for this problem: time-space tradeoff for palindromes in one-tape model (see for example, here).
Solution
This can be done in $O(n \log n)$ word operations and $O(1)$ words of memory (respectively $O(n \log^2 n)$ time and $O(\log n)$ memory in bit-level RAM model). Indeed, the solution will be based on the following observation.
Say there are $n_0$ even and $n_1$ odd numbers in range $[1, n + 1]$ (so $n_0 \approx n_1$ and $n_0 + n_1 = n + 1$). Then there is $b \in \{0, 1\}$ such that there are at most $n_b - 1$ values with parity $b$ in the input. Indeed, otherwise there are at least $n_0$ even and at least $n_1$ odd values in the input, meaning that there are at least $n_0 + n_1 = n + 1$ values in the input, contradiction (there are only $n$ of them). It means that we can continue searching for missing number only among the odd or even numbers only. The same algorithm can be applied to higher bits of binary notation as well.
So our algorithm will look like that:
-
Suppose that we already now that there are only $x$ values in the input with remainder modulo $2^b$ being equal to $r \in [0, 2^b)$ but there are at least $x + 1$ numbers in range $[1, n + 1]$ that have remainder $r$ modulo $2^b$ (at the start we know
that for sure for $b = 0, r = 0$).
-
Say there are $x_0$ values in the input with remainder $r$ modulo $2^{b + 1}$ and $x_1$ values in the input with remainder $r + 2^b$ modulo $2^{b + 1}$ (we can find these numbers in a single pass through the input). Clearly, $x_0 + x_1 = x$. Moreover, because there are at least $x + 1$ numbers in the input with remainder $r$ modulo $2^b$, at least one of the pairs $(r, b + 1), (r + 2^b, b + 1)$ satisfies the requirements of the step $1$.
-
We have found the missing number when $2^b \geqslant n + 1$: there is only one number in range $[1, n + 1]$ that may have remainder $r$ modulo $2^b$ ($r$ itself if it is in range), so there are at most zero values in the input that have such remainder. So $r$ is indeed missing.
Clearly, the algorithm halts in $O(\log n)$ steps, each of them needs $O(n)$ time (single pass over the input array). Moreover, only $O(1)$ words of memory are required.
Say there are $n_0$ even and $n_1$ odd numbers in range $[1, n + 1]$ (so $n_0 \approx n_1$ and $n_0 + n_1 = n + 1$). Then there is $b \in \{0, 1\}$ such that there are at most $n_b - 1$ values with parity $b$ in the input. Indeed, otherwise there are at least $n_0$ even and at least $n_1$ odd values in the input, meaning that there are at least $n_0 + n_1 = n + 1$ values in the input, contradiction (there are only $n$ of them). It means that we can continue searching for missing number only among the odd or even numbers only. The same algorithm can be applied to higher bits of binary notation as well.
So our algorithm will look like that:
-
Suppose that we already now that there are only $x$ values in the input with remainder modulo $2^b$ being equal to $r \in [0, 2^b)$ but there are at least $x + 1$ numbers in range $[1, n + 1]$ that have remainder $r$ modulo $2^b$ (at the start we know
that for sure for $b = 0, r = 0$).
-
Say there are $x_0$ values in the input with remainder $r$ modulo $2^{b + 1}$ and $x_1$ values in the input with remainder $r + 2^b$ modulo $2^{b + 1}$ (we can find these numbers in a single pass through the input). Clearly, $x_0 + x_1 = x$. Moreover, because there are at least $x + 1$ numbers in the input with remainder $r$ modulo $2^b$, at least one of the pairs $(r, b + 1), (r + 2^b, b + 1)$ satisfies the requirements of the step $1$.
-
We have found the missing number when $2^b \geqslant n + 1$: there is only one number in range $[1, n + 1]$ that may have remainder $r$ modulo $2^b$ ($r$ itself if it is in range), so there are at most zero values in the input that have such remainder. So $r$ is indeed missing.
Clearly, the algorithm halts in $O(\log n)$ steps, each of them needs $O(n)$ time (single pass over the input array). Moreover, only $O(1)$ words of memory are required.
Context
StackExchange Computer Science Q#2464, answer score: 2
Revisions (0)
No revisions yet.