HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Creating a table using given items

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
creatingitemsusinggiventable

Problem

This simply creates a table using the items passed in. I would like to make this function shorter and/or more efficient. What I have seems pretty redundant.

def apriori_creator(items, k):
    """Function that creates apriori table using supported candidates"""
    table = []
    the_length = len(items)
    new_k = k-2 # first k-2 items as described in algorithm in book for 'apriori gen'
    for x in range(the_length):
        next = x + 1
        for y in range(next, the_length):
            combined = items[x].union(items[y]) # merge
            l_sub_one = list(items[x])
            l_sub_one.sort()
            l_sub_two = list(items[y])
            l_sub_two.sort()
            first = l_sub_one[:new_k] # getting the first k-2 items of the set
            second = l_sub_two[:new_k]
            if first == second: 
                table.append(combined)
    return table

Solution

For efficiency, you can take take some things out of the inner loop:

  • The computations that depend only on x should be in the outer loop.



  • The computation of combined should be under the if where it is used.



  • Using enumerate and direct iteration instead of range is simpler and a little faster.



-
l_sub_one = sorted(items[x]) does the same as these two lines:

l_sub_one = list(items[x])
    l_sub_one.sort()


Revised code:

def apriori_creator(items, k):
    """Function that creates apriori table using supported candidates"""
    table = []
    new_k = k-2 # first k-2 items as described in algorithm in book for 'apriori gen'
    for x, item_x in enumerate(items):
        l_sub_one = sorted(item_x)
        first = l_sub_one[:new_k] # getting the first k-2 items of the set
        for item_y in items[x+1:]:
            l_sub_two = sorted(item_y)
            second = l_sub_two[:new_k]
            if first == second: 
                combined = item_x.union(item_y) # merge
                table.append(combined)
    return table


A further improvement is possible.
One way to describe what you are doing is that you are generating all different pairs of items that share the same value of sorted(item)[:new_k]. Your code does this by considering all possible pairs. A more efficient approach would begin by grouping the items by that value. With the help of a dictionary this only requires one pass over the items. Then you only need to generate pairs withing each group.

Here's again revised code. I'm using collections.defaultdict for grouping, and itertools.combinations for pairing.

from collections import defaultdict
from itertools import combinations

def apriori_creator(items, k):
    """Function that creates apriori table using supported candidates"""
    new_k = k-2 # first k-2 items as described in algorithm in book for 'apriori gen'
    groups = defaultdict(list)  # group by the first k-2 items of the set
    for item in items:
        l_sub_one = sorted(item)
        key = tuple(l_sub_one[:new_k])
        groups[key].append(item)

    table = []
    for group in groups.values():
        if len(group) > 1:
            for item_x, item_y in combinations(group, 2): # all pairs
                combined = item_x.union(item_y) # merge
                table.append(combined)
    return table

Code Snippets

l_sub_one = list(items[x])
    l_sub_one.sort()
def apriori_creator(items, k):
    """Function that creates apriori table using supported candidates"""
    table = []
    new_k = k-2 # first k-2 items as described in algorithm in book for 'apriori gen'
    for x, item_x in enumerate(items):
        l_sub_one = sorted(item_x)
        first = l_sub_one[:new_k] # getting the first k-2 items of the set
        for item_y in items[x+1:]:
            l_sub_two = sorted(item_y)
            second = l_sub_two[:new_k]
            if first == second: 
                combined = item_x.union(item_y) # merge
                table.append(combined)
    return table
from collections import defaultdict
from itertools import combinations

def apriori_creator(items, k):
    """Function that creates apriori table using supported candidates"""
    new_k = k-2 # first k-2 items as described in algorithm in book for 'apriori gen'
    groups = defaultdict(list)  # group by the first k-2 items of the set
    for item in items:
        l_sub_one = sorted(item)
        key = tuple(l_sub_one[:new_k])
        groups[key].append(item)

    table = []
    for group in groups.values():
        if len(group) > 1:
            for item_x, item_y in combinations(group, 2): # all pairs
                combined = item_x.union(item_y) # merge
                table.append(combined)
    return table

Context

StackExchange Code Review Q#73629, answer score: 2

Revisions (0)

No revisions yet.