HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Find a weighted median for unsorted array in linear time

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
lineararrayweightedunsortedtimeforfindmedian

Problem

For days, I'm trying to figure out, whether it is possible to find an item in array which would be kind of weighted median in linear time.

It is very simple to do that in exponential time.

So let's say that we have an array, each item $x$ of this array has two attributes - price $c(x)$ and weight $w(x)$. The goal is to find an item $x$ such that
$$
\sum_{y\colon c(y) c(x)} w(y) \leq \frac{1}{2} \sum_y w(y).
$$

If the array was sorted by price, it would be simple:

Go from the first item one by one, count sum and if the sum becomes greater that half the total weight, then you found the desired item.

Could you give me some hint how to find such an item in linear time?

Solution

Let $A$ be an input array containing $n$ elements, $a_i$ the $i$-th element and $w_i$ its corresponding weight. You can determine the weighted median in worst case linear time as follows. If the array length is $\leq 2$, find the weighted median by exhaustive search. Otherwise, find the (lower) median element $a_x$ using the worst case $O(n)$ selection algorithm and then partition the array around it (using the worst case $O(n)$ partition algorithm from QuickSort). Now determine the weight of each partition. If weight of the left partition is $< \frac{1}{2}$ and weight of the right partition is $\leq \frac{1}{2}$ then the weighted (lower) median is $a_x$. If not, then the weighted (lower) median must necessarily lie in the partition with the larger weight. So, you add the weight of the "lighter" partition to the weight of $a_x$ and recursively continue searching into the "heavier" partition. Here is the algorithm's pseudocode (written in Latex).

FIND-WEIGHTED_LOWER_MEDIAN(A)
    if $n$ == 1
       return $a_1$
    elseif $n$ == 2
       if $w_1 \geq w_2$
          return $a_1$ 
       else 
          return $a_2$ 
    else
       determine $a_x$, the (lower) median of A
       partition A around $a_x$
       $W_{low}$ = $\sum\limits_{{a_i}  {a_x}} {{w_i}}$
       if $W_{low} < \frac{1}{2}$ AND $W_{high} \leq \frac{1}{2}$
          return $a_x$
       else if $W_{low} \geq \frac{1}{2}$
               $w_x = w_x + W_{high}$
               B = \{ a_i \in A: a_i \leq a_x \}
               FIND-WEIGHTED_LOWER_MEDIAN(B)
            else
               $w_x = w_x + W_{low}$
               B = \{ a_i \in A: a_i \geq a_x \}
               FIND-WEIGHTED_LOWER_MEDIAN(B)


Now let's analyze the algorithm and derive its complexity in the worst-case. The recurrence of this recursive algorithm is $T(n) = T(\frac{n}{2} + 1) + \Theta(n)$. Indeed, we have no more than a recursive call on half the elements plus the (lower) median. The initial exhaustive search on up to 2 elements costs $O(1)$, determining the (lower) median using the select algorithm requires $O(n)$, and partitioning around the (lower) median requires $O(n)$ as well. Computing $W_{low}$ or $W_{high}$ is again $O(n)$. Solving the recurrence, we get the complexity of the algorithm in the worst case, which is $O(n)$.

Of course, a real implementation should not use the worst case $O(n)$ selection algorithm, since it is well known that the algorithm is not practical (it is of theoretical interest only). However, the randomized $O(n)$ selection algorithm works pretty well in practice and can be used since it's really fast.

Code Snippets

FIND-WEIGHTED_LOWER_MEDIAN(A)
    if $n$ == 1
       return $a_1$
    elseif $n$ == 2
       if $w_1 \geq w_2$
          return $a_1$ 
       else 
          return $a_2$ 
    else
       determine $a_x$, the (lower) median of A
       partition A around $a_x$
       $W_{low}$ = $\sum\limits_{{a_i} < {a_x}} {{w_i}}$
       $W_{high}$ = $\sum\limits_{{a_i} > {a_x}} {{w_i}}$
       if $W_{low} < \frac{1}{2}$ AND $W_{high} \leq \frac{1}{2}$
          return $a_x$
       else if $W_{low} \geq \frac{1}{2}$
               $w_x = w_x + W_{high}$
               B = \{ a_i \in A: a_i \leq a_x \}
               FIND-WEIGHTED_LOWER_MEDIAN(B)
            else
               $w_x = w_x + W_{low}$
               B = \{ a_i \in A: a_i \geq a_x \}
               FIND-WEIGHTED_LOWER_MEDIAN(B)

Context

StackExchange Computer Science Q#56299, answer score: 10

Revisions (0)

No revisions yet.