HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Heap implementation of the "median maintenance" algorithm

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
theheapalgorithmmaintenanceimplementationmedian

Problem

I have implemented a heap data structure for Coursera's algorithms course. The problem I have to solve is:


The goal of this problem is to implement the "Median Maintenance"
algorithm. The text file contains a list of the integers from 1 to
10000 in unsorted order; you should treat this as a stream of numbers,
arriving one by one. Letting xi denote the ith number of the file, the
kth median mk is defined as the median of the numbers x1,…,xk. (So, if
k is odd, then mk is ((k+1)/2)th smallest number among x1,…,xk; if k
is even, then mk is the (k/2)th smallest number among x1,…,xk.)


Find the sum of the 1000 medians.

Header

#include 
#include 
#include 

using namespace std;

class Heap{
public:
    vector heap;

    int parent(int i);
    int* children(int i);
    int insert(int key);
    int bubbleUp(int i);    
    int extractMin();
    int bubbleDown(int i);
    int findMin(int a, int b);
};


Source

```
#include "medmain.h"

int Heap::parent(int i){
if (i%2 == 0) return i/2-1;
else return (i-1)/2;
}

int* Heap::children(int i){
int* child = new int[3];
child[0] = 2*i+1;
child[1] = 2*i+2;
if (heap[child[0]] 1) bubbleUp(heap.size()-1);
return 0;
}

int Heap::bubbleUp(int i){
if (i==0) return 1;
if (heap[parent(i)] > heap[i]){
iter_swap(heap.begin() + parent(i), heap.begin() + i);
bubbleUp(parent(i));
}
return 1;
}

int Heap::extractMin(){
iter_swap(heap.begin(), heap.end()-1);
int minValue = heap.back();
heap.pop_back();
bubbleDown(0);
return minValue;
}

int Heap::findMin(int a, int b){
if (heap[a] heap[candidate]){
iter_swap(heap.begin() + i, heap.begin() + candidate);
bubbleDown(candidate);
}
delete[] child;
return 1;
}

int main(){
Heap heapL,heapH;

ifstream ipstream;
ipstream.open("Median.txt");
int output1, output2, output;
vector median;

ipstream >> output1;
ipstream >> output2;

Solution

min heap, max heap

To keep track of the running median, an efficient solution is to store the higher half of numbers in a min heap, and the lower half of numbers in a max heap. When adding a new number, keep the size of the heaps balanced, and then the median is one of the top elements.

You did essentially this, but with a little bit unintuitive twist: instead of a min heap and a max heap, you use two min heaps, and emulate the max heap by inserting numbers negated. This may have seemed easier than implementing a heap that can be configured as min or max, but in the end you have a lot of extra logic related to negating elements for the fake max heap, and reversing the negation when you take elements out.

Think of it this way: the heap is a tool to make your job easier. A tool like this should reduce the complexity of the rest of the program. The tool is best if it let's the rest of the program focus on the main logic (keeping the heaps balanced, finding the median), without worrying about all the negations.

Heap interface

All methods in your heap are declared public, and they shouldn't be. The essential methods of a heap that your main implementation needs:

  • insert



  • removeTop



  • size



Methods like bubbleUp, bubbleDown are low level implementation details that should be hidden from clients.

Unnecessary storage

You store the medians in a vector and then iterate over it to calculate the sum. It would be better to calculate the sum a you find the medians, no need for the extra storage.

Context

StackExchange Code Review Q#95058, answer score: 2

Revisions (0)

No revisions yet.