HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Count the ways to partition an array into two equal sets

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thepartitionarrayintowaysequaltwocountsets

Problem

I need to count the number of ways to partition an array into two contiguous arrays such that those arrays contain exactly same set of items.

Suppose A = [4, 1, 1, 2, 4, 2, 4, 1]. If you pass this array to the function, it needs to return 2. (Those subarrays are [4, 1, 1, 2] & [4, 2, 4, 1] and [4, 1, 1, 2, 4] & [2, 4, 1].)

Suppose A = [1, 5, 1]. If you pass this array to the function, it needs to return 0.

My code given below. Is there any better way to do this?

def solve(A):    
    count = 0
    for i in range(1, len(A)-1/2):

        left = A[:i]
        right = A[i:]

        left[:i].sort()
        right[i:].sort()

        if set(left) == set(right):
            count += 1
    return count

Solution

Your current approach is going to be very slow because you're slicing the each time and then that is followed by an unnecessary sort calls and ultimately you call set() on both of the lists.

Slicing a list is very slow, because it requires creating an array of new references(pointers) to the items of the original array. To fix that we can use collections.deque here, ie. start with two deques first. First one is going be a deque(say full_deque) of the list passed to the array and second one will be empty(say current_deque). And now during the loop we simply pop the leftmost item from the full_deque and append it to the current_deque, these are O(1) operations for a deque so we will save a lot of time. And then simply call set() on both deques and compare those sets:

from collections import deque

def solve_deque(seq):
    count = 0
    full_deque = deque(seq)
    current_deque = deque()
    for _ in xrange(len(seq)):
        current_deque.append(full_deque.popleft())
        if set(current_deque) == set(full_deque):
            count += 1
    return count


The above operation is around 7 times faster than your version:

>>> A = [random.randint(0, 100) for _ in xrange(1000)]
>>> %timeit solve(A)
10 loops, best of 3: 74.4 ms per loop
>>> %timeit solve_deque(A)
100 loops, best of 3: 11.3 ms per loop


But for bigger list it is still not good enough as the set() calls in each loop are expensive.

>>> A = [random.randint(0, 100) for _ in xrange(10**5)]
>>> %timeit solve_deque(A)
1 loops, best of 3: 1min 26s per loop


To improve this even further we need to approach this differently, and one such approach is to use a Counter. Similar to deque we maintain two Counters here: 1. full_counter: Contains count of the initial list items 2. current_counter: Contains counts of items seen so far. Apart from these we are also going to maintain two sets here: 1. full_keys: Initial unique items from the list 2. current_keys: Unique items seen so far.

Here during the loop we are going to increment the count of current item in current_counter and decrease its count from the full_counter. But we can't compare these counters, so that's where the 2 sets come in. Firstly add current item to the current_keys set and then check if the count of current item is equal to 0 in full_counter and if it is then we can remove this item from the full_keys set and then simply compare these two sets:

from collections import Counter

def solve_counter(seq):
    count = 0
    full_counter = Counter(seq)
    current_counter = Counter()
    full_keys = set(seq)
    current_keys = set()

    for item in seq:
        current_counter[item] += 1
        full_counter[item] -= 1
        current_keys.add(item)
        if full_counter[item] == 0:
            full_keys.remove(item)
        count += full_keys == current_keys
    return count


Timings:

>>> A = [random.randint(0, 100) for _ in xrange(1000)]
>>> %timeit solve_counter(A)
1000 loops, best of 3: 764 µs per loop
>>> A = [random.randint(0, 100) for _ in xrange(10**5)]
>>> %timeit solve_counter(A)
10 loops, best of 3: 109 ms per loop


Few other points:

-
The recommended way to get both item and the index is to use enumerate() rather than using range() with indexing. Also in Python 2 range() returns a list, so unless you actually need a list always use xrange().

-
Use upper case variable names(A) only for module level constants, that applies to the function arguments as well, we can use num_list or seq instead of A here.

-
left[:i].sort() and right[i:].sort() are useless, slice on Python lists returns a new shallow copy. So, here you created two new lists first, sorted them and then threw them away. To sort left and right, simply do left.sort() and rigth.sort().

Code Snippets

from collections import deque

def solve_deque(seq):
    count = 0
    full_deque = deque(seq)
    current_deque = deque()
    for _ in xrange(len(seq)):
        current_deque.append(full_deque.popleft())
        if set(current_deque) == set(full_deque):
            count += 1
    return count
>>> A = [random.randint(0, 100) for _ in xrange(1000)]
>>> %timeit solve(A)
10 loops, best of 3: 74.4 ms per loop
>>> %timeit solve_deque(A)
100 loops, best of 3: 11.3 ms per loop
>>> A = [random.randint(0, 100) for _ in xrange(10**5)]
>>> %timeit solve_deque(A)
1 loops, best of 3: 1min 26s per loop
from collections import Counter

def solve_counter(seq):
    count = 0
    full_counter = Counter(seq)
    current_counter = Counter()
    full_keys = set(seq)
    current_keys = set()

    for item in seq:
        current_counter[item] += 1
        full_counter[item] -= 1
        current_keys.add(item)
        if full_counter[item] == 0:
            full_keys.remove(item)
        count += full_keys == current_keys
    return count
>>> A = [random.randint(0, 100) for _ in xrange(1000)]
>>> %timeit solve_counter(A)
1000 loops, best of 3: 764 µs per loop
>>> A = [random.randint(0, 100) for _ in xrange(10**5)]
>>> %timeit solve_counter(A)
10 loops, best of 3: 109 ms per loop

Context

StackExchange Code Review Q#79942, answer score: 4

Revisions (0)

No revisions yet.