patternMinor
Determine number of values less than mean in one pass through list
Viewed 0 times
numberpassthanmeanonelessdeterminethroughvalueslist
Problem
The problem statement is as follows:
Can we determine precisely the number of elements less than the mean in a list $A$ of $n$ numbers in only one pass through the array (starting at $A_1$ and ending at $A_n$) with only constant memory?
By "constant memory" I mean a constant number of $O(m \log_2 n)$ bit words where $m$ is the number of bits to represent the largest value in $A$. Some other things to consider:
It's clear with 2 passes through the array we can get this value precisely with no probability involved. With one pass through the array this seems not be possible, but I am not sure. You could create an adversary that waits until the last element to give it an extreme value and throw off all previous calculations. However, with a uniform distribution this is limited. Any thoughts on these variations or an answer under the assumptions provided would be appreciated.
Can we determine precisely the number of elements less than the mean in a list $A$ of $n$ numbers in only one pass through the array (starting at $A_1$ and ending at $A_n$) with only constant memory?
By "constant memory" I mean a constant number of $O(m \log_2 n)$ bit words where $m$ is the number of bits to represent the largest value in $A$. Some other things to consider:
- Assume $m$ is non-constant and $m \geq \log_2 n$. If $m$ were constant, we could not even count $n$ as Yuval mentioned.
- Assume values in $A$ are integers, but does it affect the problem if they are rational, or real valued?
- Assume values in $A$ are uniform random. It helps if they are uniform random because we can get a tight confidence bound with a constant number of elements. How high of a confidence bound can we get?
- Time complexity does not matter. You can spend as much time as you want on a particular $A_i$ if that helps, but once you move to $A_{i+1}$ you cannot go back unless you have stored $A_i$ in one of your constant memory words.
- What if the values in $A$ are non-uniform random with distribution is unknown? What if the distribution is known?
- What if we are not constrained to going from $A_1$ to $A_n$, perhaps we could randomly sample and get a tight confidence bound even if the distribution is non-uniform?
- What if instead of mean, we wanted values less than median? Does this help at all?
It's clear with 2 passes through the array we can get this value precisely with no probability involved. With one pass through the array this seems not be possible, but I am not sure. You could create an adversary that waits until the last element to give it an extreme value and throw off all previous calculations. However, with a uniform distribution this is limited. Any thoughts on these variations or an answer under the assumptions provided would be appreciated.
Solution
Consider an algorithm for arrays of length $n$ consisting of entries from $0$ to $n$, and using space $S$ bits.
Suppose that the first half of the array consists of pairs $a,n-a$ (where $a \leq n/2$). By choosing the second half appropriately, we can make the average of the entire array anything from $n/4$ to $3n/4$ (discretized to multiples of $1/n$). In particular, the state of the memory after reading the first half should store the number of pairs with $n/4 \leq a \leq n/2$. For every non-empty subset of $\{n/4,\ldots,n/2\}$, we can construct a first half of the array which contains exactly these values of $a$. This shows that the number of first halves that are pairwise distinguishable is at least $2^{n/4}$, hence $S = \Omega(n)$.
A very similar construction establishes an $\Omega(n)$ lower bound even when replacing the mean with the median.
Suppose that the first half of the array consists of pairs $a,n-a$ (where $a \leq n/2$). By choosing the second half appropriately, we can make the average of the entire array anything from $n/4$ to $3n/4$ (discretized to multiples of $1/n$). In particular, the state of the memory after reading the first half should store the number of pairs with $n/4 \leq a \leq n/2$. For every non-empty subset of $\{n/4,\ldots,n/2\}$, we can construct a first half of the array which contains exactly these values of $a$. This shows that the number of first halves that are pairwise distinguishable is at least $2^{n/4}$, hence $S = \Omega(n)$.
A very similar construction establishes an $\Omega(n)$ lower bound even when replacing the mean with the median.
Context
StackExchange Computer Science Q#106836, answer score: 4
Revisions (0)
No revisions yet.