patterncppMinor
Heap implementation of the "median maintenance" algorithm
Viewed 0 times
theheapalgorithmmaintenanceimplementationmedian
Problem
I have implemented a heap data structure for Coursera's algorithms course. The problem I have to solve is:
The goal of this problem is to implement the "Median Maintenance"
algorithm. The text file contains a list of the integers from 1 to
10000 in unsorted order; you should treat this as a stream of numbers,
arriving one by one. Letting xi denote the ith number of the file, the
kth median mk is defined as the median of the numbers x1,…,xk. (So, if
k is odd, then mk is ((k+1)/2)th smallest number among x1,…,xk; if k
is even, then mk is the (k/2)th smallest number among x1,…,xk.)
Find the sum of the 1000 medians.
Header
Source
```
#include "medmain.h"
int Heap::parent(int i){
if (i%2 == 0) return i/2-1;
else return (i-1)/2;
}
int* Heap::children(int i){
int* child = new int[3];
child[0] = 2*i+1;
child[1] = 2*i+2;
if (heap[child[0]] 1) bubbleUp(heap.size()-1);
return 0;
}
int Heap::bubbleUp(int i){
if (i==0) return 1;
if (heap[parent(i)] > heap[i]){
iter_swap(heap.begin() + parent(i), heap.begin() + i);
bubbleUp(parent(i));
}
return 1;
}
int Heap::extractMin(){
iter_swap(heap.begin(), heap.end()-1);
int minValue = heap.back();
heap.pop_back();
bubbleDown(0);
return minValue;
}
int Heap::findMin(int a, int b){
if (heap[a] heap[candidate]){
iter_swap(heap.begin() + i, heap.begin() + candidate);
bubbleDown(candidate);
}
delete[] child;
return 1;
}
int main(){
Heap heapL,heapH;
ifstream ipstream;
ipstream.open("Median.txt");
int output1, output2, output;
vector median;
ipstream >> output1;
ipstream >> output2;
The goal of this problem is to implement the "Median Maintenance"
algorithm. The text file contains a list of the integers from 1 to
10000 in unsorted order; you should treat this as a stream of numbers,
arriving one by one. Letting xi denote the ith number of the file, the
kth median mk is defined as the median of the numbers x1,…,xk. (So, if
k is odd, then mk is ((k+1)/2)th smallest number among x1,…,xk; if k
is even, then mk is the (k/2)th smallest number among x1,…,xk.)
Find the sum of the 1000 medians.
Header
#include
#include
#include
using namespace std;
class Heap{
public:
vector heap;
int parent(int i);
int* children(int i);
int insert(int key);
int bubbleUp(int i);
int extractMin();
int bubbleDown(int i);
int findMin(int a, int b);
};Source
```
#include "medmain.h"
int Heap::parent(int i){
if (i%2 == 0) return i/2-1;
else return (i-1)/2;
}
int* Heap::children(int i){
int* child = new int[3];
child[0] = 2*i+1;
child[1] = 2*i+2;
if (heap[child[0]] 1) bubbleUp(heap.size()-1);
return 0;
}
int Heap::bubbleUp(int i){
if (i==0) return 1;
if (heap[parent(i)] > heap[i]){
iter_swap(heap.begin() + parent(i), heap.begin() + i);
bubbleUp(parent(i));
}
return 1;
}
int Heap::extractMin(){
iter_swap(heap.begin(), heap.end()-1);
int minValue = heap.back();
heap.pop_back();
bubbleDown(0);
return minValue;
}
int Heap::findMin(int a, int b){
if (heap[a] heap[candidate]){
iter_swap(heap.begin() + i, heap.begin() + candidate);
bubbleDown(candidate);
}
delete[] child;
return 1;
}
int main(){
Heap heapL,heapH;
ifstream ipstream;
ipstream.open("Median.txt");
int output1, output2, output;
vector median;
ipstream >> output1;
ipstream >> output2;
Solution
min heap, max heap
To keep track of the running median, an efficient solution is to store the higher half of numbers in a min heap, and the lower half of numbers in a max heap. When adding a new number, keep the size of the heaps balanced, and then the median is one of the top elements.
You did essentially this, but with a little bit unintuitive twist: instead of a min heap and a max heap, you use two min heaps, and emulate the max heap by inserting numbers negated. This may have seemed easier than implementing a heap that can be configured as min or max, but in the end you have a lot of extra logic related to negating elements for the fake max heap, and reversing the negation when you take elements out.
Think of it this way: the heap is a tool to make your job easier. A tool like this should reduce the complexity of the rest of the program. The tool is best if it let's the rest of the program focus on the main logic (keeping the heaps balanced, finding the median), without worrying about all the negations.
Heap interface
All methods in your heap are declared public, and they shouldn't be. The essential methods of a heap that your main implementation needs:
Methods like bubbleUp, bubbleDown are low level implementation details that should be hidden from clients.
Unnecessary storage
You store the medians in a vector and then iterate over it to calculate the sum. It would be better to calculate the sum a you find the medians, no need for the extra storage.
To keep track of the running median, an efficient solution is to store the higher half of numbers in a min heap, and the lower half of numbers in a max heap. When adding a new number, keep the size of the heaps balanced, and then the median is one of the top elements.
You did essentially this, but with a little bit unintuitive twist: instead of a min heap and a max heap, you use two min heaps, and emulate the max heap by inserting numbers negated. This may have seemed easier than implementing a heap that can be configured as min or max, but in the end you have a lot of extra logic related to negating elements for the fake max heap, and reversing the negation when you take elements out.
Think of it this way: the heap is a tool to make your job easier. A tool like this should reduce the complexity of the rest of the program. The tool is best if it let's the rest of the program focus on the main logic (keeping the heaps balanced, finding the median), without worrying about all the negations.
Heap interface
All methods in your heap are declared public, and they shouldn't be. The essential methods of a heap that your main implementation needs:
- insert
- removeTop
- size
Methods like bubbleUp, bubbleDown are low level implementation details that should be hidden from clients.
Unnecessary storage
You store the medians in a vector and then iterate over it to calculate the sum. It would be better to calculate the sum a you find the medians, no need for the extra storage.
Context
StackExchange Code Review Q#95058, answer score: 2
Revisions (0)
No revisions yet.