HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Determine number of values less than mean in one pass through list

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
numberpassthanmeanonelessdeterminethroughvalueslist

Problem

The problem statement is as follows:


Can we determine precisely the number of elements less than the mean in a list $A$ of $n$ numbers in only one pass through the array (starting at $A_1$ and ending at $A_n$) with only constant memory?

By "constant memory" I mean a constant number of $O(m \log_2 n)$ bit words where $m$ is the number of bits to represent the largest value in $A$. Some other things to consider:

  • Assume $m$ is non-constant and $m \geq \log_2 n$. If $m$ were constant, we could not even count $n$ as Yuval mentioned.



  • Assume values in $A$ are integers, but does it affect the problem if they are rational, or real valued?



  • Assume values in $A$ are uniform random. It helps if they are uniform random because we can get a tight confidence bound with a constant number of elements. How high of a confidence bound can we get?



  • Time complexity does not matter. You can spend as much time as you want on a particular $A_i$ if that helps, but once you move to $A_{i+1}$ you cannot go back unless you have stored $A_i$ in one of your constant memory words.



  • What if the values in $A$ are non-uniform random with distribution is unknown? What if the distribution is known?



  • What if we are not constrained to going from $A_1$ to $A_n$, perhaps we could randomly sample and get a tight confidence bound even if the distribution is non-uniform?



  • What if instead of mean, we wanted values less than median? Does this help at all?



It's clear with 2 passes through the array we can get this value precisely with no probability involved. With one pass through the array this seems not be possible, but I am not sure. You could create an adversary that waits until the last element to give it an extreme value and throw off all previous calculations. However, with a uniform distribution this is limited. Any thoughts on these variations or an answer under the assumptions provided would be appreciated.

Solution

Consider an algorithm for arrays of length $n$ consisting of entries from $0$ to $n$, and using space $S$ bits.

Suppose that the first half of the array consists of pairs $a,n-a$ (where $a \leq n/2$). By choosing the second half appropriately, we can make the average of the entire array anything from $n/4$ to $3n/4$ (discretized to multiples of $1/n$). In particular, the state of the memory after reading the first half should store the number of pairs with $n/4 \leq a \leq n/2$. For every non-empty subset of $\{n/4,\ldots,n/2\}$, we can construct a first half of the array which contains exactly these values of $a$. This shows that the number of first halves that are pairwise distinguishable is at least $2^{n/4}$, hence $S = \Omega(n)$.

A very similar construction establishes an $\Omega(n)$ lower bound even when replacing the mean with the median.

Context

StackExchange Computer Science Q#106836, answer score: 4

Revisions (0)

No revisions yet.