HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Set theory operations (union, intersection etc.) for one dimensional integer intervals

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
operationstheoryuniononeintervalsforintersectiondimensionalintegerset

Problem

For my first not-just-a-few-days-long Python project, I needed something to handle basic set theory operations (union, intersection etc.) for one dimensional integer intervals, so I came up with this module.

It would be cool if you could tell me if the code is OK or something you would frown upon if you had to continue my project, and if so, what I can do better.

```
"""intervals

Union, intersection, set difference and symmetric difference
of possibly overlapping or touching integer intervals.
Intervals are defined right-open. (1, 4) -> 1, 2, 3

e.g.
union([(1, 4), (7, 9)], (3, 5)) -> [(1, 5), (7, 9)]
intersection([(1, 4), (7, 9)], (3, 5)) -> [(3, 4)]
set_difference([(1, 4), (7, 9)], (3, 5)) -> [(1, 3), (7, 9)]
set_difference([(3, 5)], [(1, 4), (7, 9)]) -> [(4, 5)]
symmetric_difference([(1, 4), (7, 9)], (3, 5)) -> [(1, 3), (4, 5), (7, 9)]

see: http://en.wikipedia.org/wiki/Set_theory#Basic_concepts_and_notation
"""

import copy
from itertools import accumulate, chain, islice, repeat
from operator import itemgetter
import unittest

class Intervals():
"""Holds a non overlapping list of intervals.
One single interval is just a pair.
Overlapping or touching intervals are automatically merged.
"""

def __init__(self, interval_list=[]):
"""Raises a ValueError if the length of one of the
intervals in the list is negative.
"""
if any(begin > end for begin, end in interval_list):
raise ValueError('Invalid interval')
self._interval_list = _merge_interval_lists(
interval_list, [])

def __repr__(self):
"""Just write out all included intervals.
"""
return 'Intervals ' + str(self._interval_list)

def get(self, copy_content=True):
"""Return the list of intervals.
"""
return copy.copy(self._interval_list) if copy_content\
else self._interval_list

def union(a, b):
"""Merge a and b (union).
"""
return Inter

Solution

IMHO, having a function which accepts the operation as a string is an atrocity, and I would disallow it in a code review.

Why not have separate functions? That’s much cleaner and makes the code easier to read. If you’re concerned about code duplication, take a look at the strategy pattern. You could for instance pass (named) sets of callbacks rather than strings. Like so:

id = lambda x: x
OP_UNION = (
    id, # Only needed for symmetric difference
    lambda change: change == (0, 1),
    lambda change: change == (1, 0))
…


Or alternatively using namedtuple:

set_op = collections.namedtuple('set_op', ['transform', 'begin', 'end'])

# And here’s a handy helper to avoid lots of repetitive lambdas
def check_for(cond):
    return lambda change: change == cond

OP_UNION = set_op(
    transform = id,
    begin = check_for((0, 1)),
    end = check_for((1, 0)))
…


This would also shorten your – otherwise quite nice – _merge_interval_lists implementation (here without comments for brevity, though your comments were very good):

# Removing the default for merge_type – why have this?
def _merge_interval_lists(a, b, merge_type):
    b = merge_type.transform(b)

    both = list(chain(a, b))
    begins = zip(sorted(map(itemgetter(0), both)),
            repeat(1))
    ends = zip(sorted(map(itemgetter(1), both)),
            repeat(-1))

    edges = sorted(chain(begins, ends), key=lambda x: (x[0], -x[1]))

    counts = list(accumulate(map(itemgetter(1), edges)))
    changes = zip(chain([0], counts), counts)
    xs = map(itemgetter(0), edges)
    xs_and_changes = list(zip(xs, changes))

    res_begins = map(itemgetter(0),
            starfilter(lambda x, change: merge_type.begin(change),
                xs_and_changes))
    res_ends = map(itemgetter(0),
            starfilter(lambda x, change: merge_type.end(change),
                xs_and_changes))

    result = pairwise(sorted(chain(res_begins, res_ends)), False)

    def length_greater_than_zero(interval):
        return interval[0] < interval[1]

    return list(filter(length_greater_than_zero, result))

Code Snippets

id = lambda x: x
OP_UNION = (
    id, # Only needed for symmetric difference
    lambda change: change == (0, 1),
    lambda change: change == (1, 0))
…
set_op = collections.namedtuple('set_op', ['transform', 'begin', 'end'])

# And here’s a handy helper to avoid lots of repetitive lambdas
def check_for(cond):
    return lambda change: change == cond

OP_UNION = set_op(
    transform = id,
    begin = check_for((0, 1)),
    end = check_for((1, 0)))
…
# Removing the default for merge_type – why have this?
def _merge_interval_lists(a, b, merge_type):
    b = merge_type.transform(b)

    both = list(chain(a, b))
    begins = zip(sorted(map(itemgetter(0), both)),
            repeat(1))
    ends = zip(sorted(map(itemgetter(1), both)),
            repeat(-1))

    edges = sorted(chain(begins, ends), key=lambda x: (x[0], -x[1]))

    counts = list(accumulate(map(itemgetter(1), edges)))
    changes = zip(chain([0], counts), counts)
    xs = map(itemgetter(0), edges)
    xs_and_changes = list(zip(xs, changes))

    res_begins = map(itemgetter(0),
            starfilter(lambda x, change: merge_type.begin(change),
                xs_and_changes))
    res_ends = map(itemgetter(0),
            starfilter(lambda x, change: merge_type.end(change),
                xs_and_changes))

    result = pairwise(sorted(chain(res_begins, res_ends)), False)

    def length_greater_than_zero(interval):
        return interval[0] < interval[1]

    return list(filter(length_greater_than_zero, result))

Context

StackExchange Code Review Q#27862, answer score: 2

Revisions (0)

No revisions yet.