patternpythonModerate
Filtering a list for certain strings
Viewed 0 times
forlistcertainstringsfiltering
Problem
I want to create a function that filters a list with the following criteria:
The amount of code I am using for this task seems excessive. Can I produce the same results with something more compact? If yes, how?
- Strings containing numbers should be removed
- Strings containing only one word and first letter of the word is not uppercase should also be removed.
- Strings containing more than one word and first letter of each words except first word is not uppercase should also be removed.
The amount of code I am using for this task seems excessive. Can I produce the same results with something more compact? If yes, how?
def list_filter(str_list):
filtered = []
is_end = False
for string in str_list:
words = string.split(' ')
for i, word in enumerate(words):
if i > 0:
if word[0].isupper():
continue
else:
is_end = True
break
if word.isdigit():
is_end = True
break
if len(words) == 1 and not word[0].isupper():
is_end = True
break
if is_end:
break
else:
filtered.append(string)
return filteredSolution
I took the liberty of rewriting most of this from scratch for a few reasons.
First, this very difficult to actually test. All your embedded if statements... oh my! Second it's also difficult to add new checks into if you want other filter conditions.
Code like your original code starts out well intentioned but becomes a giant spaghetti mess fairly quickly - you only have three conditions and it's already a complex and hard to read and quickly understand function.
I moved each conditional check into a separate, small function:
Fairly straightforward here. Both these test specific conditions that can be optimized a bit better in single functions.
This can streamline your main logic significantly (using
Notice how easy it is now to add new check conditions, you just need to make/test a small function and add it to your filter list. Then iterate over that list, checking if any of them return true for the current string.
You could make this a list comprehension and even simpler if you want:
Adding new logical checks is suddenly trivial. Changing existing ones, similarly. With one main "if tree to end all if trees" you will over time end up with a complete mess.
This approach also has the added feature/benefit of making testing much more trivial (I used basic assert statements here):
This is a lot easier to isolate problems, particularly if this ends up with many filters.
I would recommend adding more tests here, there are conditions I'm not sure how you want to test or what they should return.
First, this very difficult to actually test. All your embedded if statements... oh my! Second it's also difficult to add new checks into if you want other filter conditions.
Code like your original code starts out well intentioned but becomes a giant spaghetti mess fairly quickly - you only have three conditions and it's already a complex and hard to read and quickly understand function.
I moved each conditional check into a separate, small function:
# Convention: returns true if should be removed, based on criteria
def contains_numbers(s):
return any(char.isdigit() for char in s)Any is a great use for things like this. It will return True if any condition in the list matches.def single_word_is_not_all_caps(s):
if len(s.split(' ')) == 1:
return s.upper() != s
return False
def contains_multiple_words_not_uppercase(s):
words = s.split(' ')
if len(words) > 1:
return words[0][0].upper() != words[0][0]
return FalseFairly straightforward here. Both these test specific conditions that can be optimized a bit better in single functions.
This can streamline your main logic significantly (using
any again):filters = [contains_numbers,
contains_multiple_words_not_uppercase,
single_word_is_not_all_caps]
def filter_strings(strs):
filtered = []
for s in strs:
if not any(foo(s) for foo in filters):
filtered.append(s)
return filteredNotice how easy it is now to add new check conditions, you just need to make/test a small function and add it to your filter list. Then iterate over that list, checking if any of them return true for the current string.
You could make this a list comprehension and even simpler if you want:
def filter_strings_list_comp(strs):
return [s for s in strs if not any(f(s) for f in filters)]Adding new logical checks is suddenly trivial. Changing existing ones, similarly. With one main "if tree to end all if trees" you will over time end up with a complete mess.
This approach also has the added feature/benefit of making testing much more trivial (I used basic assert statements here):
assert(True == contains_multiple_words_not_uppercase('foo bar'))
assert(False == contains_multiple_words_not_uppercase('Foo bar'))
assert(False == contains_multiple_words_not_uppercase('Foo'))
assert(True == contains_numbers('123'))
assert(True == contains_numbers('1asdf23'))
assert(False == contains_numbers('Foo bar'))
assert(False == contains_numbers(''))
assert(False == single_word_is_not_all_caps('FOO'))
assert(False == single_word_is_not_all_caps('foo bar'))
assert(False == single_word_is_not_all_caps('Foo bar'))
assert(True == single_word_is_not_all_caps('Foo'))
assert(['FOO'] == filter_strings(['Foo', 'FOO']))
assert(['BAR', 'Foo bar'] == filter_strings(['Foo', 'foo', 'BAR', 'Foo bar', '123']))
assert(['FOO'] == filter_strings_list_comp(['Foo', 'FOO']))
assert(['BAR', 'Foo bar'] == filter_strings_list_comp(['Foo', 'foo', 'BAR', 'Foo bar', '123']))This is a lot easier to isolate problems, particularly if this ends up with many filters.
I would recommend adding more tests here, there are conditions I'm not sure how you want to test or what they should return.
Code Snippets
# Convention: returns true if should be removed, based on criteria
def contains_numbers(s):
return any(char.isdigit() for char in s)def single_word_is_not_all_caps(s):
if len(s.split(' ')) == 1:
return s.upper() != s
return False
def contains_multiple_words_not_uppercase(s):
words = s.split(' ')
if len(words) > 1:
return words[0][0].upper() != words[0][0]
return Falsefilters = [contains_numbers,
contains_multiple_words_not_uppercase,
single_word_is_not_all_caps]
def filter_strings(strs):
filtered = []
for s in strs:
if not any(foo(s) for foo in filters):
filtered.append(s)
return filtereddef filter_strings_list_comp(strs):
return [s for s in strs if not any(f(s) for f in filters)]assert(True == contains_multiple_words_not_uppercase('foo bar'))
assert(False == contains_multiple_words_not_uppercase('Foo bar'))
assert(False == contains_multiple_words_not_uppercase('Foo'))
assert(True == contains_numbers('123'))
assert(True == contains_numbers('1asdf23'))
assert(False == contains_numbers('Foo bar'))
assert(False == contains_numbers(''))
assert(False == single_word_is_not_all_caps('FOO'))
assert(False == single_word_is_not_all_caps('foo bar'))
assert(False == single_word_is_not_all_caps('Foo bar'))
assert(True == single_word_is_not_all_caps('Foo'))
assert(['FOO'] == filter_strings(['Foo', 'FOO']))
assert(['BAR', 'Foo bar'] == filter_strings(['Foo', 'foo', 'BAR', 'Foo bar', '123']))
assert(['FOO'] == filter_strings_list_comp(['Foo', 'FOO']))
assert(['BAR', 'Foo bar'] == filter_strings_list_comp(['Foo', 'foo', 'BAR', 'Foo bar', '123']))Context
StackExchange Code Review Q#133390, answer score: 12
Revisions (0)
No revisions yet.