HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Deep/recursive join, all, any, sum, len

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
allanyjoinrecursivedeepsumlen

Problem

I keep forgetting that the standard join() can only take a single iterable, so I made a few functions that act recursively on any passed arguments. Somewhat ironically, the deep version of join() isn't called by the user, but only within the deep version of sum() - the standard sum() breaks and tells you to use join() if you try to sum() strings, but this one just calls the deep join() automatically.

Several questions prompted me to post this here:

  • Is this a good idea? Why didn't Google turn up any results for such a thing (i.e., why doesn't anyone seem to have tried it)?



  • Could anything particularly bad happen with these, like system instability or silent failures?



  • And, of course, how could they be improved? I'd like equal parts readability and speed.



It's 100 lines total, including comments, but each function is between 4 and 12 lines (plus the definition). Feel free to comment on all or any of them.

```
def _djoin(*args, s=''):
"""
Executes a recursive string join on all passed arguments and their contents.
Parameters:
*args (tuple): An unrolled tuple of arguments.
s (string): Optional. Separates each element with the given string.
"""
if len(args) == 1:
try:
iter(args[0])
if type(args[0]) == str:
raise TypeError
return s.join(_djoin(arg, s=s) for arg in args[0])
except TypeError:
return str(args[0])
return s.join(_djoin(arg, s=s) for arg in args)

def dall(*args):
"""
Executes a recursive all() on all passed arguments and their contents.
Parameter:
*args (tuple): An unrolled tuple of arguments.
"""
if len(args) == 1:
try:
iter(args[0])
if type(args[0]) == str or not len(args[0]):
raise TypeError
return all(dall(arg) for arg in args[0])
except TypeError:
return bool(args[0])
return all(dall(arg) for arg in args

Solution

You have a lot of duplication here - the code for flattening the *args tuple appears multiple times. I would factor that out to a single function, _flatten, which could be a generator to deal with large inputs:

def _flatten(iter_):
    if isinstance(iter_, str):
        yield iter_
    else:
        try:
            for obj in iter_:
                yield from _flatten(obj)
        except TypeError:
             yield iter_


(Note that yield from is only available from Python 3.3.) This will neatly unroll your tuple of arguments:

>>> list(_flatten(('foo', 'bar', [123,456,789,'baz'])))
['foo', 'bar', 123, 456, 789, 'baz']


Now e.g. _djoin becomes:

def _djoin(*args, s=''):
    return s.join(map(str, _flatten(args)))


and works just the same:

>>> _djoin('foo', 'bar', [123,456,789,'baz'], s=' ')
'foo bar 123 456 789 baz'


Similarly e.g. dall becomes return all(_flatten(args)).

Note that in the above _flatten implementation I've used isinstance, rather than type(iter) == str. This will deal appropriately with inheritance (i.e. subclasses of str will also be handled correctly). dsum should also use this:

def dsum(*args, s=0):
    if isinstance(s, str):
        return _djoin(*args, s=s)
    ...


See e.g. "Differences between isinstance() and type() in python"

Your current test suite requires you to read each line to validate whether the outputs were as expected. Life would be much simpler if you used assert for this, for example:

assert _djoin('foo', 'bar', 123, s=' ') == 'foo bar 123'


This will give no output if everything is OK, but raise an error if a test fails:

>>> assert _djoin('foo', 'bar', 123, s=' ') == 'foo bar 123'
>>> assert _djoin('foo', 'bar', 123, s=' ') == 'derp'
Traceback (most recent call last):
  File "", line 1, in 
    assert _djoin('foo', 'bar', 123, s=' ') == 'derp'
AssertionError


Alternatively, you could consider implementing doctests, e.g.:

def _djoin(*args, s=''):
    """Flatten the arguments and join them together as strings.

        >>> _djoin('foo', 'bar', 123, s=' ')
        'foo bar 123'

    """
    ...


Then at the bottom of deep.py you can easily run all tests with:

if __name__ == '__main__':
    import doctest
    doctest.testmod(verbose=True)


and you will get useful outputs on what was tested, what worked and what didn't. For example, a failing output from my development of ssum below:

...
Trying:
    ssum(1, 2, 3)
Expecting:
    6
ok
Trying:
    ssum('foo', 'bar', 'baz')
Expecting:
    'foobarbaz'
**********************************************************************
File "C:/Python34/deep.py", line 49, in __main__.ssum
Failed example:
    ssum('foo', 'bar', 'baz')
Expected:
    'foobarbaz'
Got:
    'foofoobarbaz'
1 items had no tests:
    __main__
3 items passed all tests:
   2 tests in __main__._djoin
   3 tests in __main__._flatten
   1 tests in __main__.dsum
**********************************************************************
1 items had failures:
   1 of   2 in __main__.ssum
8 tests in 5 items.
7 passed and 1 failed.
***Test Failed*** 1 failures.


(I had passed seq instead of iter_ to _djoin - d'oh!)

The ssum implementation seems a bit odd; the repeated use of iter and next makes the code difficult to read and is unlikely to be efficient. Instead, consider something like:

def ssum(*seq):
    """Sum over the sequence, determining a sensible start value.

        >>> ssum(1, 2, 3)
        6
        >>> ssum('foo', 'bar', 'baz')
        'foobarbaz'

    """
    iter_ = _flatten(seq)
    first = next(iter_)
    if isinstance(first, str):
        return _djoin(first, iter_)
    return sum(iter_, first)


This makes it clear that the logic is based on evaluating the type of the first object to determine a "sensible start value".

Code Snippets

def _flatten(iter_):
    if isinstance(iter_, str):
        yield iter_
    else:
        try:
            for obj in iter_:
                yield from _flatten(obj)
        except TypeError:
             yield iter_
>>> list(_flatten(('foo', 'bar', [123,456,789,'baz'])))
['foo', 'bar', 123, 456, 789, 'baz']
def _djoin(*args, s=''):
    return s.join(map(str, _flatten(args)))
>>> _djoin('foo', 'bar', [123,456,789,'baz'], s=' ')
'foo bar 123 456 789 baz'
def dsum(*args, s=0):
    if isinstance(s, str):
        return _djoin(*args, s=s)
    ...

Context

StackExchange Code Review Q#86842, answer score: 6

Revisions (0)

No revisions yet.