HiveBrain v1.2.0
Get Started
← Back to all entries
patternMajor

Explaination for Variation of Boyer-Moore Majority voting algorithm

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
votingmajorityexplainationalgorithmforvariationboyermoore

Problem

Boyer-Moore's majority vote algorithms can be used to determine the majority element in a linear time and constant space.

The intuition behind finding the majority element is understandable as it has to be greater than the count of other elements in the input sequence if a majority element exists.

However, there is a variation of this algorithm to find the elements occurring more than [n / 3] where n is the length of the sequence.

The algorithm goes like this.



  • Start with two empty candidate slots and two counters set to 0.



  • for each item:





  • if it is equal to either candidate, increment the corresponding count



  • else if there is an empty slot (i.e. a slot with count 0), put it in that


slot and set the count to 1

  • else reduce both counters by 1





I can understand there will be at most (n /3) - 1 entries so we keep two containers and their counts.

But I'm not sure why the last reduce both by 1 is pivotal to this algorithm. I would be very helpful if you explain the intuition behind this.

Code of the above algorithm

vector majorityElement(vector& nums) {
    int cnt1=0, cnt2=0;
    int a,b;
    for(int n: nums){
        if (cnt1 == 0 || n == a){
            cnt1++;
            a = n;
        }
        else if (cnt2 == 0 || n==b){
            cnt2++;
            b = n;
        }
        else{ // This part
            cnt1--;
            cnt2--;
        }
    }
    cnt1=cnt2=0;
    for(int n: nums){
        if (n==a) cnt1++;
        else if (n==b) cnt2++;
    }
    vector result;
    if (cnt1 > nums.size()/3) result.push_back(a);
    if (cnt2 > nums.size()/3) result.push_back(b);
    return result;
}


Algorithm and code source

Solution

Here is a different way to consider the same algorithm, or rather its general form with $k$ counters (in your case, $k=2$). There are $k$ "containers" and a trash bag. Each container will contain one type of element at a time. Initially everything is empty. When we read a new element $x$, we consider the following cases:

  • If there is a container containing $x$'s, add $x$ to that container.



  • If there is no container containing $x$ and there is an empty container, put $x$ in the empty container.



  • If there is no container containing $x$ and there are no empty containers, remove one element from each container, and put $x$ in the trash.



When the algorithm terminates, each of the $n$ elements is either found in one of the $k$ containers, or in the trash bag.

Suppose that element $x$, not found in any of the containers, is found $m$ times in the input. Thus all $m$ copies of $x$ are in the trash bag. Each time an element is put in the trash bag, $k$ other elements are also put in the trash bag together with it. This means that the trash bag contains at least $(k+1)m$ elements, and so $(k+1)m \leq n$ or $m \leq n/(k+1)$.

Consequently, if an element appears more than $n/(k+1)$ times in the input then it will necessarily be found in one the $k$ containers. In other words, the algorithm correctly identifies all elements appearing more than $n/(k+1)$ times.

Note how this analysis crucially depends on the fact that when trashing $x$ in the third case above, we also trash $k$ additional elements together with it.

Context

StackExchange Computer Science Q#91803, answer score: 20

Revisions (0)

No revisions yet.