patternpythonMinor
Set theory operations (union, intersection etc.) for one dimensional integer intervals
Viewed 0 times
operationstheoryuniononeintervalsforintersectiondimensionalintegerset
Problem
For my first not-just-a-few-days-long Python project, I needed something to handle basic set theory operations (union, intersection etc.) for one dimensional integer intervals, so I came up with this module.
It would be cool if you could tell me if the code is OK or something you would frown upon if you had to continue my project, and if so, what I can do better.
```
"""intervals
Union, intersection, set difference and symmetric difference
of possibly overlapping or touching integer intervals.
Intervals are defined right-open. (1, 4) -> 1, 2, 3
e.g.
union([(1, 4), (7, 9)], (3, 5)) -> [(1, 5), (7, 9)]
intersection([(1, 4), (7, 9)], (3, 5)) -> [(3, 4)]
set_difference([(1, 4), (7, 9)], (3, 5)) -> [(1, 3), (7, 9)]
set_difference([(3, 5)], [(1, 4), (7, 9)]) -> [(4, 5)]
symmetric_difference([(1, 4), (7, 9)], (3, 5)) -> [(1, 3), (4, 5), (7, 9)]
see: http://en.wikipedia.org/wiki/Set_theory#Basic_concepts_and_notation
"""
import copy
from itertools import accumulate, chain, islice, repeat
from operator import itemgetter
import unittest
class Intervals():
"""Holds a non overlapping list of intervals.
One single interval is just a pair.
Overlapping or touching intervals are automatically merged.
"""
def __init__(self, interval_list=[]):
"""Raises a ValueError if the length of one of the
intervals in the list is negative.
"""
if any(begin > end for begin, end in interval_list):
raise ValueError('Invalid interval')
self._interval_list = _merge_interval_lists(
interval_list, [])
def __repr__(self):
"""Just write out all included intervals.
"""
return 'Intervals ' + str(self._interval_list)
def get(self, copy_content=True):
"""Return the list of intervals.
"""
return copy.copy(self._interval_list) if copy_content\
else self._interval_list
def union(a, b):
"""Merge a and b (union).
"""
return Inter
It would be cool if you could tell me if the code is OK or something you would frown upon if you had to continue my project, and if so, what I can do better.
```
"""intervals
Union, intersection, set difference and symmetric difference
of possibly overlapping or touching integer intervals.
Intervals are defined right-open. (1, 4) -> 1, 2, 3
e.g.
union([(1, 4), (7, 9)], (3, 5)) -> [(1, 5), (7, 9)]
intersection([(1, 4), (7, 9)], (3, 5)) -> [(3, 4)]
set_difference([(1, 4), (7, 9)], (3, 5)) -> [(1, 3), (7, 9)]
set_difference([(3, 5)], [(1, 4), (7, 9)]) -> [(4, 5)]
symmetric_difference([(1, 4), (7, 9)], (3, 5)) -> [(1, 3), (4, 5), (7, 9)]
see: http://en.wikipedia.org/wiki/Set_theory#Basic_concepts_and_notation
"""
import copy
from itertools import accumulate, chain, islice, repeat
from operator import itemgetter
import unittest
class Intervals():
"""Holds a non overlapping list of intervals.
One single interval is just a pair.
Overlapping or touching intervals are automatically merged.
"""
def __init__(self, interval_list=[]):
"""Raises a ValueError if the length of one of the
intervals in the list is negative.
"""
if any(begin > end for begin, end in interval_list):
raise ValueError('Invalid interval')
self._interval_list = _merge_interval_lists(
interval_list, [])
def __repr__(self):
"""Just write out all included intervals.
"""
return 'Intervals ' + str(self._interval_list)
def get(self, copy_content=True):
"""Return the list of intervals.
"""
return copy.copy(self._interval_list) if copy_content\
else self._interval_list
def union(a, b):
"""Merge a and b (union).
"""
return Inter
Solution
IMHO, having a function which accepts the operation as a string is an atrocity, and I would disallow it in a code review.
Why not have separate functions? That’s much cleaner and makes the code easier to read. If you’re concerned about code duplication, take a look at the strategy pattern. You could for instance pass (named) sets of callbacks rather than strings. Like so:
Or alternatively using
This would also shorten your – otherwise quite nice –
Why not have separate functions? That’s much cleaner and makes the code easier to read. If you’re concerned about code duplication, take a look at the strategy pattern. You could for instance pass (named) sets of callbacks rather than strings. Like so:
id = lambda x: x
OP_UNION = (
id, # Only needed for symmetric difference
lambda change: change == (0, 1),
lambda change: change == (1, 0))
…Or alternatively using
namedtuple:set_op = collections.namedtuple('set_op', ['transform', 'begin', 'end'])
# And here’s a handy helper to avoid lots of repetitive lambdas
def check_for(cond):
return lambda change: change == cond
OP_UNION = set_op(
transform = id,
begin = check_for((0, 1)),
end = check_for((1, 0)))
…This would also shorten your – otherwise quite nice –
_merge_interval_lists implementation (here without comments for brevity, though your comments were very good):# Removing the default for merge_type – why have this?
def _merge_interval_lists(a, b, merge_type):
b = merge_type.transform(b)
both = list(chain(a, b))
begins = zip(sorted(map(itemgetter(0), both)),
repeat(1))
ends = zip(sorted(map(itemgetter(1), both)),
repeat(-1))
edges = sorted(chain(begins, ends), key=lambda x: (x[0], -x[1]))
counts = list(accumulate(map(itemgetter(1), edges)))
changes = zip(chain([0], counts), counts)
xs = map(itemgetter(0), edges)
xs_and_changes = list(zip(xs, changes))
res_begins = map(itemgetter(0),
starfilter(lambda x, change: merge_type.begin(change),
xs_and_changes))
res_ends = map(itemgetter(0),
starfilter(lambda x, change: merge_type.end(change),
xs_and_changes))
result = pairwise(sorted(chain(res_begins, res_ends)), False)
def length_greater_than_zero(interval):
return interval[0] < interval[1]
return list(filter(length_greater_than_zero, result))Code Snippets
id = lambda x: x
OP_UNION = (
id, # Only needed for symmetric difference
lambda change: change == (0, 1),
lambda change: change == (1, 0))
…set_op = collections.namedtuple('set_op', ['transform', 'begin', 'end'])
# And here’s a handy helper to avoid lots of repetitive lambdas
def check_for(cond):
return lambda change: change == cond
OP_UNION = set_op(
transform = id,
begin = check_for((0, 1)),
end = check_for((1, 0)))
…# Removing the default for merge_type – why have this?
def _merge_interval_lists(a, b, merge_type):
b = merge_type.transform(b)
both = list(chain(a, b))
begins = zip(sorted(map(itemgetter(0), both)),
repeat(1))
ends = zip(sorted(map(itemgetter(1), both)),
repeat(-1))
edges = sorted(chain(begins, ends), key=lambda x: (x[0], -x[1]))
counts = list(accumulate(map(itemgetter(1), edges)))
changes = zip(chain([0], counts), counts)
xs = map(itemgetter(0), edges)
xs_and_changes = list(zip(xs, changes))
res_begins = map(itemgetter(0),
starfilter(lambda x, change: merge_type.begin(change),
xs_and_changes))
res_ends = map(itemgetter(0),
starfilter(lambda x, change: merge_type.end(change),
xs_and_changes))
result = pairwise(sorted(chain(res_begins, res_ends)), False)
def length_greater_than_zero(interval):
return interval[0] < interval[1]
return list(filter(length_greater_than_zero, result))Context
StackExchange Code Review Q#27862, answer score: 2
Revisions (0)
No revisions yet.