HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Inversion count via divide and conquer

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
divideinversionviaandcountconquer

Problem

I'm happy to hear thoughts and ideas on structure/performance/testing/whatever and multi-threading, which I haven't gotten into yet with Python.

Full code here. Assignment file and a test file included in the folder.

Excerpt:

```
import unittest
import sys

def Merge_and_CountSplitInv(A,start_index,len_1st,len_2nd):
"""
Merges two similar length pre-sorted sub-slices of array A while counting
inversions.
Input:
array A, start_index (location of first sub-slice)
len_1st,len_2nd (subslice_lengths).
Output: inversions (values where iA[j])
Side Effect: two consecutive A subsections are sorted.
"""
inversions=0
temp_array=[]
index_1st=start_index
index_2nd=start_index+len_1st
#while both indices are in range
while ((index_1st A[j])
Side Effect: A is sorted.
"""
if n==1:
return 0 # base case, array of length 1
else:
# Input: 1st and 2nd halves of current sub-array
len_1st=n//2
len_2nd=n//2+n%2
x=Sort_and_Count(A, len_1st, start_index)
y=Sort_and_Count(A, len_2nd, start_index+len_1st)
# merge the (newly sorted) half-sized sub-arrays
z=Merge_and_CountSplitInv(A,start_index,len_1st,len_2nd)
return x+y+z

class TestSort_and_Count(unittest.TestCase):
"""
Basic test class
"""

def test_Sort_and_Count(self):
A=[1]
res1 = Sort_and_Count(A,len(A)) # single element
self.assertEqual(res1, 0)
B=[1,3,5,2,4,6]
res2 = Sort_and_Count(B,len(B)) # even length
self.assertEqual(res2, 3)
C=[1,3,5,2,4,6,3]
res3 = Sort_and_Count(C,len(C)) # odd length, duplicate value
self.assertEqual(res3, 6)

def main(file_name):
#TODO: take values from file and run sort_and_count
with open(file_name) as fh:
if file_name[:4] == 'test':
print(fh.readline()) # get rid of first answer line from debug file
A = list(map(int, [line.strip() f

Solution


  1. Bug



If you pass the empty list to Sort_and_Count, it recurses until it runs out of stack:

>>> Sort_and_Count([], 0)
Traceback (most recent call last):
  File "cr107928.py", line 67, in test_Sort_and_Count
    Sort_and_Count([],0)
  ... many lines omitted ...
  File "cr107928.py", line 49, in Sort_and_Count
    if n==1:
RecursionError: maximum recursion depth exceeded in comparison


  1. Review



-
The code omits spaces around operators. This makes it hard to read. It is better to follow the Python style guide (PEP8):


Always surround these binary operators with a single space on either side: assignment (=), augmented assignment (+=, -= etc.), comparisons (==, `, !=, <>, =, in, not in, is, is not), Booleans (and, or, not). [...] If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies).

-
The names do not follow the Python style guide. This recommends:


Class names should normally use the CapWords convention.

so
TestSortAndCount would be better than TestSort_and_Count.


Function names should be lowercase, with words separated by underscores as necessary to improve readability.

so
merge_and_count_inversions would be better than Merge_and_CountSplitInv.

It's not compulsory to follow the Python style guide, but it makes it easier for other Python programmers to read your code.

-
Python's documentation uses the term "sequence" to refer to indexable collections, not "array". There's less risk of confusion if you stick to the standard term.

-
The docstring says, "Merges two similar length pre-sorted sub-slices", but in fact the function works fine even if the subsequences are not of similar length. I know that when called from
Sort_and_Count the subsequences will be equal in length or differ by one, but it's better to document the actual behaviour of the function.

-
Instead of
index_1st and index_2nd, I would use i and j. These variables are used many times, so using short names reduces the amount of code and make it easier to read.

-
The arguments
len_1st and len_2nd are only used within the expressions start_index+len_1st and start_index+len_1st+len_2nd. This suggests that the arguments to the function should be start, middle and end instead.

-
Instead of updating
A one element at a time:

write_index=start_index
for value in temp_array:
    A[write_index]=value
    write_index +=1


update all elements at once using a slice assignment:

A[start_index:start_index + len(temp_array)] = temp_array


This is efficient (in CPython, anyway) because the replacement is the same length as the slice being assigned.

-
In
Sort_and_Count, the caller is expected to pass in the length of the sequence. It would be simpler if Sort_and_Count called len instead. This could easily be done using a helper function; see updated code below.

-
The test case doesn't try the empty list (see "Bug" above).

-
The test cases only test that the inversion count is correct, but don't actually test that the sequence is sorted! You need something like:

self.assertEqual(A, sorted(A))


-
The test cases have hard-coded counts of inversions. But it is easy to count the inversions in a sequence, using
itertools.combinations:

sum(j < i for i, j in combinations(seq, 2))


This is inefficient (it's \$ Θ(n^2) \$), but clearly correct, and so a good comparison for the efficient, but not clearly correct, counts produced by the function under test.

-
The test cases are very repetitive. This could be avoided with a helper method, something like this:

def check(self, seq):
    expected = sum(j < i for i, j in combinations(seq, 2))
    found = sort_and_count_inversions(seq)
    self.assertEqual(expected, found)
    self.assertEqual(seq, sorted(seq))


-
The test cases are not very thorough. You could easily test many more cases using random test generation:

for _ in range(100):
    self.check([randrange(100) for _ in range(randrange(100))])


  1. Revised code



``
from itertools import combinations
from random import randrange
from unittest import TestCase

def merge_and_count_inversions(seq, start, middle, end):
"""Merge two adjacent sorted sub-sequences of seq and count
inversions.

Arguments:
seq -- sequence to sort
start -- index of beginning of first sorted subsequence
middle -- end of first, and beginning of second, sorted subsequence
end -- end of second sorted subsequence

Result: number of inversions (cases where i seq[j]).

Side effect: seq is sorted between start and end.

"""
assert 0 seq[j]).

Side effect: seq is sorted.

"""
def sort_and_count(seq, start, end):
if end - start < 2:
return 0
middle = (start + end) // 2
return (sort_and_count(seq, start, middle)
+ sort_and_count(seq, middle, end)

Code Snippets

>>> Sort_and_Count([], 0)
Traceback (most recent call last):
  File "cr107928.py", line 67, in test_Sort_and_Count
    Sort_and_Count([],0)
  ... many lines omitted ...
  File "cr107928.py", line 49, in Sort_and_Count
    if n==1:
RecursionError: maximum recursion depth exceeded in comparison
write_index=start_index
for value in temp_array:
    A[write_index]=value
    write_index +=1
A[start_index:start_index + len(temp_array)] = temp_array
self.assertEqual(A, sorted(A))
sum(j < i for i, j in combinations(seq, 2))

Context

StackExchange Code Review Q#107928, answer score: 9

Revisions (0)

No revisions yet.