HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

Filtering a list for certain strings

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
forlistcertainstringsfiltering

Problem

I want to create a function that filters a list with the following criteria:

  • Strings containing numbers should be removed



  • Strings containing only one word and first letter of the word is not uppercase should also be removed.



  • Strings containing more than one word and first letter of each words except first word is not uppercase should also be removed.



The amount of code I am using for this task seems excessive. Can I produce the same results with something more compact? If yes, how?

def list_filter(str_list):
    filtered = []
    is_end = False
    for string in str_list:
        words = string.split(' ')
        for i, word in enumerate(words):
            if i > 0:
                if word[0].isupper():
                    continue
                else:
                    is_end = True
                    break
            if word.isdigit():
                is_end = True
                break
            if len(words) == 1 and not word[0].isupper():
                is_end = True
                break
        if is_end:
            break
        else:
            filtered.append(string)
    return filtered

Solution

I took the liberty of rewriting most of this from scratch for a few reasons.

First, this very difficult to actually test. All your embedded if statements... oh my! Second it's also difficult to add new checks into if you want other filter conditions.

Code like your original code starts out well intentioned but becomes a giant spaghetti mess fairly quickly - you only have three conditions and it's already a complex and hard to read and quickly understand function.

I moved each conditional check into a separate, small function:

# Convention: returns true if should be removed, based on criteria
def contains_numbers(s):
    return any(char.isdigit() for char in s)


Any is a great use for things like this. It will return True if any condition in the list matches.

def single_word_is_not_all_caps(s):
    if len(s.split(' ')) == 1:
        return s.upper() != s
    return False

def contains_multiple_words_not_uppercase(s):
    words = s.split(' ')
    if len(words) > 1:
        return words[0][0].upper() != words[0][0]
    return False


Fairly straightforward here. Both these test specific conditions that can be optimized a bit better in single functions.

This can streamline your main logic significantly (using any again):

filters = [contains_numbers,
       contains_multiple_words_not_uppercase,
       single_word_is_not_all_caps]

def filter_strings(strs):

    filtered = []
    for s in strs:
        if not any(foo(s) for foo in filters):
            filtered.append(s)

    return filtered


Notice how easy it is now to add new check conditions, you just need to make/test a small function and add it to your filter list. Then iterate over that list, checking if any of them return true for the current string.

You could make this a list comprehension and even simpler if you want:

def filter_strings_list_comp(strs):
    return [s for s in strs if not any(f(s) for f in filters)]


Adding new logical checks is suddenly trivial. Changing existing ones, similarly. With one main "if tree to end all if trees" you will over time end up with a complete mess.

This approach also has the added feature/benefit of making testing much more trivial (I used basic assert statements here):

assert(True == contains_multiple_words_not_uppercase('foo bar'))
assert(False == contains_multiple_words_not_uppercase('Foo bar'))
assert(False == contains_multiple_words_not_uppercase('Foo'))

assert(True == contains_numbers('123'))
assert(True == contains_numbers('1asdf23'))
assert(False == contains_numbers('Foo bar'))
assert(False == contains_numbers(''))

assert(False == single_word_is_not_all_caps('FOO'))
assert(False == single_word_is_not_all_caps('foo bar'))
assert(False == single_word_is_not_all_caps('Foo bar'))
assert(True == single_word_is_not_all_caps('Foo'))

assert(['FOO'] == filter_strings(['Foo', 'FOO']))
assert(['BAR', 'Foo bar'] == filter_strings(['Foo', 'foo', 'BAR', 'Foo bar', '123']))

assert(['FOO'] == filter_strings_list_comp(['Foo', 'FOO']))
assert(['BAR', 'Foo bar'] == filter_strings_list_comp(['Foo', 'foo', 'BAR', 'Foo bar', '123']))


This is a lot easier to isolate problems, particularly if this ends up with many filters.

I would recommend adding more tests here, there are conditions I'm not sure how you want to test or what they should return.

Code Snippets

# Convention: returns true if should be removed, based on criteria
def contains_numbers(s):
    return any(char.isdigit() for char in s)
def single_word_is_not_all_caps(s):
    if len(s.split(' ')) == 1:
        return s.upper() != s
    return False

def contains_multiple_words_not_uppercase(s):
    words = s.split(' ')
    if len(words) > 1:
        return words[0][0].upper() != words[0][0]
    return False
filters = [contains_numbers,
       contains_multiple_words_not_uppercase,
       single_word_is_not_all_caps]

def filter_strings(strs):

    filtered = []
    for s in strs:
        if not any(foo(s) for foo in filters):
            filtered.append(s)

    return filtered
def filter_strings_list_comp(strs):
    return [s for s in strs if not any(f(s) for f in filters)]
assert(True == contains_multiple_words_not_uppercase('foo bar'))
assert(False == contains_multiple_words_not_uppercase('Foo bar'))
assert(False == contains_multiple_words_not_uppercase('Foo'))

assert(True == contains_numbers('123'))
assert(True == contains_numbers('1asdf23'))
assert(False == contains_numbers('Foo bar'))
assert(False == contains_numbers(''))

assert(False == single_word_is_not_all_caps('FOO'))
assert(False == single_word_is_not_all_caps('foo bar'))
assert(False == single_word_is_not_all_caps('Foo bar'))
assert(True == single_word_is_not_all_caps('Foo'))


assert(['FOO'] == filter_strings(['Foo', 'FOO']))
assert(['BAR', 'Foo bar'] == filter_strings(['Foo', 'foo', 'BAR', 'Foo bar', '123']))

assert(['FOO'] == filter_strings_list_comp(['Foo', 'FOO']))
assert(['BAR', 'Foo bar'] == filter_strings_list_comp(['Foo', 'foo', 'BAR', 'Foo bar', '123']))

Context

StackExchange Code Review Q#133390, answer score: 12

Revisions (0)

No revisions yet.