HiveBrain v1.2.0
Get Started
← Back to all entries
patternrubyMinor

Continuous median in Ruby

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
mediancontinuousruby

Problem

I have written a program in Ruby which calculates the median of a streaming set of numbers dynamically. For each number that arrives, the median is calculated for all the numbers that have arrived so far.

The issue I am facing is that the program times out on some inputs, and, most likely, the reason is that I return a variable that is passed to a routine. I need to do this because this variable changes inside the routine and I can't pass by reference in Ruby.

Is there any workaround for this inability to pass by reference? I have seen the code involving "binding" but I found it clumsy. Why does Ruby not have a simple, straightforward pass-by-reference as do C++, C# and Java? I have pasted my code below.

I maintain two heaps max and min. The value in any node in the max (min) heap is at least as large (small) as the value in any node below it. Any arriving number is inserted into the max heap or min heap with the proviso that the difference in the number of nodes in the two heaps does not exceed 1, and that the heap property be preserved for both heaps. With this proviso, the median can be calculated from the root elements in the two heaps.

```
Node=Struct.new(:v,:l,:r)do
end

max=0
min=0

def count(m)
m==0 ? 0 : 1 + count(m.l) + count(m.r)
end

def delete(d,type)

if d==0
return 0
end
if type=="max"
if d.l==0 && d.r==0
return 0
elsif d.l!=0 && d.r!=0
if d.l.v > d.r.v
d.v= d.l.v
d.l=delete(d.l,type)
else
d.v= d.r.v
d.r= delete(d.r,type)
end
elsif d.l == 0
d.v= d.r.v
d.r=delete(d.r,type)
else
d.v= d.l.v
d.l=delete(d.l,type)
end
else
if d.l==0 && d.r==0
return 0
elsif d.l!=0 && d.r!=0
if d.l.v t.v
tmp

Solution

Median vs. Mean

mean in math is commonly referred to as 'average'. Median means 'the number in the middle'.

The mean for (1, 4, 11) would be (1+4+11)/3.0 == 5.3333...
The median for (1, 4, 11) would be [1,4,11][(3/2.0).floor] == 4

Using your code, when I enter 2 2 3 ( take a list of 2 entries. Give me the value for 2 and 3) I get back 2.5. That's a mean, not a median. I guess there really is no median for 2 and 3, but for the sake of conversation let's say we use the logic above, it'd be 2.

I'm going to assume you want a mean instead of a median, because implementations speak more to intent than technical words.

Return Value Causing Timeout?


The issue I am facing is that the program times out on some inputs,
and, most likely, the reason is that I return a variable that is
passed to a routine. I need to do this because this variable changes
inside the routine and I can't pass by reference in Ruby.

It's not clear to me why you think a timeout is caused when you "return a variable that is passed to a routine." I'm pretty sure that is not the reason.

Passing by Reference doesn't include Immutables


Is there any workaround for this inability to pass by reference? I
have seen the code involving "binding" but I found it clumsy. Why does
Ruby not have a simple, straightforward pass-by-reference as do C++,
C# and Java?

Ruby uses pass-by-reference. However, if you "throw away" the referenced object, it will appear that Ruby used pass-by-value. Immutable items like numbers, symbols, strings, etc will always act this way because if they are changed, their reference will always be lost. If you increment or reassign a number, you aren't changing an object. You are point to a different object.

For a trivial example, just drop into IRB:

>> x = 8
=> 8
>> x.object_id
=> 17
>> x+=1
=> 9
>> x.object_id
=> 19


Without going to far afield of your question, this is because when you change from 8 to 9, you aren't changing the internal state of some object. You're actually pointing to a new object. And if you replaced a mutable object reference in ruby, you'd lose that reference, too.

def change_number(num)
   num = num + 1
 end

 def change_object(obj)
   obj = {}
 end

 def add_to_object(obj)
   obj[:bar] = :baz
 end

>> score = 8

>> score.object_id
=> 17

>> out = change_number(score)
=> 9

>> out.object_id
=> 19

>> score.object_id
=> 17  # number 9 is object 19. A new object.

>> mutable = {fu: 'bar'}
=> {:fu=>"bar"}

>> mutable.object_id
=> 6553100

>> out = change_object(mutable) # this is pass-by-reference
=> {}

>> mutable.object_id
=> 6553100 # same object we passed in

>> out.object_id
=> 7488280 # a new object was created, despite pass-by-reference.

>> out = add_to_object(mutable)
=> :baz

>> mutable.object_id
=> 6553100 # same object

>> mutable
=> {:fu=>"bar", :bar=>:baz}  # object was passed by reference!


Mean Value

This is Code Review. If this were posted on Stack Overflow, I would say, "Don't do it that way. Just keep a single sorted list of the values you've seen. Then use the sum and count."

But since this is a code review site, I'll give you some thoughts on your code.

-
Where are your unit tests?

-
I highly recommend using full variable names instead of v, k, m, d, etc. We live in a time where computers have more than enough memory to remember an entire word. And there's typically autocomplete. So just type out the full word so when you go back 3 months later you won't have to figure out what each letter meant.

-
This is an object-oriented language, so using classes would be helpful. For instance, let's create a NumberList class.

Refactor using OO

(I'm assuming that insert(v,"max",max) should be insert(value,"max",max))

Node=Struct.new(:v,:l,:r)do
end

class NumberList
  def initialize()
    @max = 0
    @min = 0
  end

  def count(m)
    m==0 ? 0 : 1 + count(m.l) + count(m.r)
  end

  def delete(d,type)
    if d==0
      return 0
    end
    if type=="max"
      if d.l==0 && d.r==0
        return 0
      elsif d.l!=0 && d.r!=0
        if  d.l.v > d.r.v
          d.v= d.l.v
          d.l=delete(d.l,type)
        else
          d.v= d.r.v
          d.r= delete(d.r,type)
        end
      elsif d.l == 0
        d.v= d.r.v
        d.r=delete(d.r,type)
      else
        d.v= d.l.v
        d.l=delete(d.l,type)
      end
    else
      if d.l==0 && d.r==0
        return 0
      elsif d.l!=0 && d.r!=0
        if  d.l.v t.v
        tmp=t.v
        t.v=v
        t.l=insert(tmp, type,t.l)
      else
        t.l=insert(v,type,t.l)
      end
    else
      if v@min.v
        tmp=@min.v
        @min=delete(@min,"min")
        @max=insert(tmp,"max",@max)
        @min=insert(value,"min",@min)
     else
       @max=insert(value,"max",@max)
     end

     p (@max.v+@min.v)/2.0
    end
  end
end

keeper = NumberList.new

n=gets.chomp.to_i

n.times do
  keeper.add_value( gets.chomp.to_i )
end


Now that we have it in an object, I could write some test

Code Snippets

>> x = 8
=> 8
>> x.object_id
=> 17
>> x+=1
=> 9
>> x.object_id
=> 19
def change_number(num)
   num = num + 1
 end

 def change_object(obj)
   obj = {}
 end

 def add_to_object(obj)
   obj[:bar] = :baz
 end

>> score = 8

>> score.object_id
=> 17

>> out = change_number(score)
=> 9

>> out.object_id
=> 19

>> score.object_id
=> 17  # number 9 is object 19. A new object.

>> mutable = {fu: 'bar'}
=> {:fu=>"bar"}

>> mutable.object_id
=> 6553100

>> out = change_object(mutable) # this is pass-by-reference
=> {}

>> mutable.object_id
=> 6553100 # same object we passed in

>> out.object_id
=> 7488280 # a new object was created, despite pass-by-reference.

>> out = add_to_object(mutable)
=> :baz

>> mutable.object_id
=> 6553100 # same object

>> mutable
=> {:fu=>"bar", :bar=>:baz}  # object was passed by reference!
Node=Struct.new(:v,:l,:r)do
end

class NumberList
  def initialize()
    @max = 0
    @min = 0
  end

  def count(m)
    m==0 ? 0 : 1 + count(m.l) + count(m.r)
  end


  def delete(d,type)
    if d==0
      return 0
    end
    if type=="max"
      if d.l==0 && d.r==0
        return 0
      elsif d.l!=0 && d.r!=0
        if  d.l.v > d.r.v
          d.v= d.l.v
          d.l=delete(d.l,type)
        else
          d.v= d.r.v
          d.r= delete(d.r,type)
        end
      elsif d.l == 0
        d.v= d.r.v
        d.r=delete(d.r,type)
      else
        d.v= d.l.v
        d.l=delete(d.l,type)
      end
    else
      if d.l==0 && d.r==0
        return 0
      elsif d.l!=0 && d.r!=0
        if  d.l.v < d.r.v
            d.v= d.l.v
            d.l=delete(d.l,type)
        else
            d.v= d.r.v
            d.r= delete(d.r,type)
        end
      elsif d.l == 0
        d.v= d.r.v
        d.r=delete(d.r,type)
      else
        d.v= d.l.v
        d.l=delete(d.l,type)
      end
    end

    return d
  end

  def insert(v,type,t)
    if t==0
      return t=Node.new(v,0,0)
    end
      if type=="max"
        if v>t.v
        tmp=t.v
        t.v=v
        t.l=insert(tmp, type,t.l)
      else
        t.l=insert(v,type,t.l)
      end
    else
      if v<t.v
        tmp=t.v
        t.v=v
        t.l=insert(tmp, type,t.l)
      else
        t.l=insert(v,type,t.l)
      end
    end

    return t
  end

  def add_value( value )
    if (k=count(@max)-count(@min))==1
      if value<@max.v
        tmp=@max.v
        @max=delete(@max,"max")
        @min=insert(tmp,"min",@min)
        @max=insert(value,"max",@max)
      else
        @min=insert(value,"min",@min)
      end

      p (@max.v+@min.v)/2.0
    elsif  k==0
      if @max==0 || value<=@max.v
        @max=insert(value,"max",@max)
        p @max.v/1.0
      else
        @min=insert(value,"min",@min)
        p @min.v/1.0
      end
    else
      if value>@min.v
        tmp=@min.v
        @min=delete(@min,"min")
        @max=insert(tmp,"max",@max)
        @min=insert(value,"min",@min)
     else
       @max=insert(value,"max",@max)
     end

     p (@max.v+@min.v)/2.0
    end
  end
end

keeper = NumberList.new

n=gets.chomp.to_i

n.times do
  keeper.add_value( gets.chomp.to_i )
end
require 'bigdecimal'

class Array
  def sum
    self.inject(0){|sum,x| sum + x }
  end

  def mean
    BigDecimal.new(self.sum) / self.count
  end
end
>> [2,4,66,32,3].mean.to_f
=> 21.4

Context

StackExchange Code Review Q#143170, answer score: 7

Revisions (0)

No revisions yet.