HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

Counting Inversions Using Merge Sort

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
countinginversionsmergeusingsort

Problem

In Corman, Introduction To Algorithms, 3rd edition, question 2-4 it asks to count the number of inversions in a list of numbers in $\theta( n \lg n )$ time. He uses a modified Merge Sort to accomplish this. However, there is something in his algorithm which seems redundant / unnecessary to me:

MERGE-INVERSIONS(A, p, q, r)
n1 = q - p + 1
n2 = r - q
let L[1 ... n1 + 1] and R[1 ... n2 + 1] be new arrays
for i = 1 to n1
    L[i] = A[p + i - 1]
for j = 1 to n2
    R[j] = A[q + j] 
L[n1 + 1] = infinity
R[n2 + 1] = infinity
i = 1
j = 1
inversions = 0
counted = FALSE
for k = p to r
    if counted == FALSE and R[j]  < L[i]
        inversions = inversions + n1 - i + 1
        counted = TRUE
    if L[i] <= R[j] 
        A[k] = L[i]
        i++
    else A[k] = R[j] 
        j++
        counted = FALSE
return inversions


The counted variable seems redundant to me and I would have written the last for loop as follows:

inversions = 0
for k = p to r
    if L[i] <= R[j] 
        A[k] = L[i]
        i++
    else A[k] = R[j] 
        inversions = inversions + n1 - i + 1
        j++
return inversions


What am I missing, or is counted really unnecessary?

Solution

We can show that after every iteration of the for-loop in question, counted is FALSE. Therefore, inversions = inversions + n1 - i + 1 is executed if and only if j++ is executed in the same iteration (both are guarded by R[j] < L[i]). Since neither i nor j is changed between evaluation of the two if conditions, this implies that your version is equivalent.


We perform an induction over k. Before the loop, counted is explicitly set to FALSE. In any given iteration, we start (by induction hypothesis) with counted == FALSE. If L[i] <= R[j], counted is not changed throughout the current iteration. If the opposite is the case, counted is first set to TRUE, but reset to FALSE in the else block. For this, we note that the truth value of R[j] < L[i] remains unchanged from the first to the second if.

I guess the extra block is there for didactic reasons.

Context

StackExchange Computer Science Q#9858, answer score: 2

Revisions (0)

No revisions yet.