patternpythonMinor
Inversion count via divide and conquer
Viewed 0 times
divideinversionviaandcountconquer
Problem
I'm happy to hear thoughts and ideas on structure/performance/testing/whatever and multi-threading, which I haven't gotten into yet with Python.
Full code here. Assignment file and a test file included in the folder.
Excerpt:
```
import unittest
import sys
def Merge_and_CountSplitInv(A,start_index,len_1st,len_2nd):
"""
Merges two similar length pre-sorted sub-slices of array A while counting
inversions.
Input:
array A, start_index (location of first sub-slice)
len_1st,len_2nd (subslice_lengths).
Output: inversions (values where iA[j])
Side Effect: two consecutive A subsections are sorted.
"""
inversions=0
temp_array=[]
index_1st=start_index
index_2nd=start_index+len_1st
#while both indices are in range
while ((index_1st A[j])
Side Effect: A is sorted.
"""
if n==1:
return 0 # base case, array of length 1
else:
# Input: 1st and 2nd halves of current sub-array
len_1st=n//2
len_2nd=n//2+n%2
x=Sort_and_Count(A, len_1st, start_index)
y=Sort_and_Count(A, len_2nd, start_index+len_1st)
# merge the (newly sorted) half-sized sub-arrays
z=Merge_and_CountSplitInv(A,start_index,len_1st,len_2nd)
return x+y+z
class TestSort_and_Count(unittest.TestCase):
"""
Basic test class
"""
def test_Sort_and_Count(self):
A=[1]
res1 = Sort_and_Count(A,len(A)) # single element
self.assertEqual(res1, 0)
B=[1,3,5,2,4,6]
res2 = Sort_and_Count(B,len(B)) # even length
self.assertEqual(res2, 3)
C=[1,3,5,2,4,6,3]
res3 = Sort_and_Count(C,len(C)) # odd length, duplicate value
self.assertEqual(res3, 6)
def main(file_name):
#TODO: take values from file and run sort_and_count
with open(file_name) as fh:
if file_name[:4] == 'test':
print(fh.readline()) # get rid of first answer line from debug file
A = list(map(int, [line.strip() f
Full code here. Assignment file and a test file included in the folder.
Excerpt:
```
import unittest
import sys
def Merge_and_CountSplitInv(A,start_index,len_1st,len_2nd):
"""
Merges two similar length pre-sorted sub-slices of array A while counting
inversions.
Input:
array A, start_index (location of first sub-slice)
len_1st,len_2nd (subslice_lengths).
Output: inversions (values where iA[j])
Side Effect: two consecutive A subsections are sorted.
"""
inversions=0
temp_array=[]
index_1st=start_index
index_2nd=start_index+len_1st
#while both indices are in range
while ((index_1st A[j])
Side Effect: A is sorted.
"""
if n==1:
return 0 # base case, array of length 1
else:
# Input: 1st and 2nd halves of current sub-array
len_1st=n//2
len_2nd=n//2+n%2
x=Sort_and_Count(A, len_1st, start_index)
y=Sort_and_Count(A, len_2nd, start_index+len_1st)
# merge the (newly sorted) half-sized sub-arrays
z=Merge_and_CountSplitInv(A,start_index,len_1st,len_2nd)
return x+y+z
class TestSort_and_Count(unittest.TestCase):
"""
Basic test class
"""
def test_Sort_and_Count(self):
A=[1]
res1 = Sort_and_Count(A,len(A)) # single element
self.assertEqual(res1, 0)
B=[1,3,5,2,4,6]
res2 = Sort_and_Count(B,len(B)) # even length
self.assertEqual(res2, 3)
C=[1,3,5,2,4,6,3]
res3 = Sort_and_Count(C,len(C)) # odd length, duplicate value
self.assertEqual(res3, 6)
def main(file_name):
#TODO: take values from file and run sort_and_count
with open(file_name) as fh:
if file_name[:4] == 'test':
print(fh.readline()) # get rid of first answer line from debug file
A = list(map(int, [line.strip() f
Solution
- Bug
If you pass the empty list to
Sort_and_Count, it recurses until it runs out of stack:>>> Sort_and_Count([], 0)
Traceback (most recent call last):
File "cr107928.py", line 67, in test_Sort_and_Count
Sort_and_Count([],0)
... many lines omitted ...
File "cr107928.py", line 49, in Sort_and_Count
if n==1:
RecursionError: maximum recursion depth exceeded in comparison- Review
-
The code omits spaces around operators. This makes it hard to read. It is better to follow the Python style guide (PEP8):
Always surround these binary operators with a single space on either side: assignment (
=), augmented assignment (+=, -= etc.), comparisons (==, `, !=, <>, =, in, not in, is, is not), Booleans (and, or, not). [...] If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies).
-
The names do not follow the Python style guide. This recommends:
Class names should normally use the CapWords convention.
so TestSortAndCount would be better than TestSort_and_Count.
Function names should be lowercase, with words separated by underscores as necessary to improve readability.
so merge_and_count_inversions would be better than Merge_and_CountSplitInv.
It's not compulsory to follow the Python style guide, but it makes it easier for other Python programmers to read your code.
-
Python's documentation uses the term "sequence" to refer to indexable collections, not "array". There's less risk of confusion if you stick to the standard term.
-
The docstring says, "Merges two similar length pre-sorted sub-slices", but in fact the function works fine even if the subsequences are not of similar length. I know that when called from Sort_and_Count the subsequences will be equal in length or differ by one, but it's better to document the actual behaviour of the function.
-
Instead of index_1st and index_2nd, I would use i and j. These variables are used many times, so using short names reduces the amount of code and make it easier to read.
-
The arguments len_1st and len_2nd are only used within the expressions start_index+len_1st and start_index+len_1st+len_2nd. This suggests that the arguments to the function should be start, middle and end instead.
-
Instead of updating A one element at a time:
write_index=start_index
for value in temp_array:
A[write_index]=value
write_index +=1
update all elements at once using a slice assignment:
A[start_index:start_index + len(temp_array)] = temp_array
This is efficient (in CPython, anyway) because the replacement is the same length as the slice being assigned.
-
In Sort_and_Count, the caller is expected to pass in the length of the sequence. It would be simpler if Sort_and_Count called len instead. This could easily be done using a helper function; see updated code below.
-
The test case doesn't try the empty list (see "Bug" above).
-
The test cases only test that the inversion count is correct, but don't actually test that the sequence is sorted! You need something like:
self.assertEqual(A, sorted(A))
-
The test cases have hard-coded counts of inversions. But it is easy to count the inversions in a sequence, using itertools.combinations:
sum(j < i for i, j in combinations(seq, 2))
This is inefficient (it's \$ Θ(n^2) \$), but clearly correct, and so a good comparison for the efficient, but not clearly correct, counts produced by the function under test.
-
The test cases are very repetitive. This could be avoided with a helper method, something like this:
def check(self, seq):
expected = sum(j < i for i, j in combinations(seq, 2))
found = sort_and_count_inversions(seq)
self.assertEqual(expected, found)
self.assertEqual(seq, sorted(seq))
-
The test cases are not very thorough. You could easily test many more cases using random test generation:
for _ in range(100):
self.check([randrange(100) for _ in range(randrange(100))])
- Revised code
``from itertools import combinations
from random import randrange
from unittest import TestCase
def merge_and_count_inversions(seq, start, middle, end):
"""Merge two adjacent sorted sub-sequences of seq and count
inversions.
Arguments:
seq -- sequence to sort
start -- index of beginning of first sorted subsequence
middle -- end of first, and beginning of second, sorted subsequence
end -- end of second sorted subsequence
Result: number of inversions (cases where i seq[j]).
Side effect: seq is sorted between start and end.
"""
assert 0 seq[j]).
Side effect: seq is sorted.
"""
def sort_and_count(seq, start, end):
if end - start < 2:
return 0
middle = (start + end) // 2
return (sort_and_count(seq, start, middle)
+ sort_and_count(seq, middle, end)
Code Snippets
>>> Sort_and_Count([], 0)
Traceback (most recent call last):
File "cr107928.py", line 67, in test_Sort_and_Count
Sort_and_Count([],0)
... many lines omitted ...
File "cr107928.py", line 49, in Sort_and_Count
if n==1:
RecursionError: maximum recursion depth exceeded in comparisonwrite_index=start_index
for value in temp_array:
A[write_index]=value
write_index +=1A[start_index:start_index + len(temp_array)] = temp_arrayself.assertEqual(A, sorted(A))sum(j < i for i, j in combinations(seq, 2))Context
StackExchange Code Review Q#107928, answer score: 9
Revisions (0)
No revisions yet.