patternpythonMinor
Removing doubles from a string
Viewed 0 times
fromstringdoublesremoving
Problem
I've wrote a function where the doubles from a string are removed:
For example:
As you see, it will remove every double letter from a string no matter if it's uppercase or lowercase.
I was wondering if this could be more pythonic. As I have to start with a string containing a space,
def removeDoubles(string):
output = ' '
for char in string:
if output[-1].lower() != char.lower():
output += char
return output[1:]For example:
removeDoubles('bookkeeper')= 'bokeper'
removeDoubles('Aardvark')= 'Ardvark'
removeDoubles('eELGRASS')= 'eLGRAS'
removeDoubles('eeEEEeeel')= 'el'
As you see, it will remove every double letter from a string no matter if it's uppercase or lowercase.
I was wondering if this could be more pythonic. As I have to start with a string containing a space,
output[-1] does exist. I was also wondering if it's possible to use list comprehensions for this.Solution
Your examples are pretty useful (especially
As per PEP 8, the official Python style guide, function names should be
Obviously, initializing
Fundamentally, this operation is a fancy string substitution. Typically, such substitutions are best done using regular expressions. In particular, you need the backreferences feature:
Backreferences in a pattern allow you to specify that the contents of an earlier capturing group must also be found at the current location in the string. For example,
For example, the following RE detects doubled words in a string.
For my interpretation of "doubles":
To preserve your implementation's behaviour:
'Aardvark'), and should be included in the documentation of the function, ideally as a doctest. However, the problem is still underspecified: what should happen when a streak of three identical characters is encountered? Should removeDoubles('eeek') return 'eek' (which is how I would interpret "doubles"), or 'ek' (which is what your code actually does)?As per PEP 8, the official Python style guide, function names should be
lower_case_with_underscores unless you have a good reason to deviate. Therefore, I recommend renaming the function to remove_doubles.Obviously, initializing
output to ' ' and then dropping it with output[1:] is cumbersome and inefficient.Fundamentally, this operation is a fancy string substitution. Typically, such substitutions are best done using regular expressions. In particular, you need the backreferences feature:
Backreferences in a pattern allow you to specify that the contents of an earlier capturing group must also be found at the current location in the string. For example,
\1 will succeed if the exact contents of group 1 can be found at the current position, and fails otherwise. Remember that Python’s string literals also use a backslash followed by numbers to allow including arbitrary characters in a string, so be sure to use a raw string when incorporating backreferences in a RE.For example, the following RE detects doubled words in a string.
>>>
>>> p = re.compile(r'(\b\w+)\s+\1')
>>> p.search('Paris in the the spring').group()
'the the'For my interpretation of "doubles":
import re
def remove_doubles(string):
"""
For each consecutive pair of the same character (case-insensitive),
drop the second character.
>>> remove_doubles('Aardvark')
'Ardvark'
>>> remove_doubles('bookkeeper')
'bokeper'
>>> remove_doubles('eELGRASS')
'eLGRAS'
>>> remove_doubles('eeek')
'eek'
"""
return re.sub(r'(.)\1', r'\1', string, flags=re.I)To preserve your implementation's behaviour:
import re
def deduplicate_consecutive_chars(string):
"""
For each consecutive streak of the same character (case-insensitive),
drop all but the first character.
>>> deduplicate_consecutive_chars('Aardvark')
'Ardvark'
>>> deduplicate_consecutive_chars('bookkeeper')
'bokeper'
>>> deduplicate_consecutive_chars('eELGRASS')
'eLGRAS'
>>> deduplicate_consecutive_chars('eeek')
'ek'
"""
return re.sub(r'(.)\1+', r'\1', string, flags=re.I)Code Snippets
>>>
>>> p = re.compile(r'(\b\w+)\s+\1')
>>> p.search('Paris in the the spring').group()
'the the'import re
def remove_doubles(string):
"""
For each consecutive pair of the same character (case-insensitive),
drop the second character.
>>> remove_doubles('Aardvark')
'Ardvark'
>>> remove_doubles('bookkeeper')
'bokeper'
>>> remove_doubles('eELGRASS')
'eLGRAS'
>>> remove_doubles('eeek')
'eek'
"""
return re.sub(r'(.)\1', r'\1', string, flags=re.I)import re
def deduplicate_consecutive_chars(string):
"""
For each consecutive streak of the same character (case-insensitive),
drop all but the first character.
>>> deduplicate_consecutive_chars('Aardvark')
'Ardvark'
>>> deduplicate_consecutive_chars('bookkeeper')
'bokeper'
>>> deduplicate_consecutive_chars('eELGRASS')
'eLGRAS'
>>> deduplicate_consecutive_chars('eeek')
'ek'
"""
return re.sub(r'(.)\1+', r'\1', string, flags=re.I)Context
StackExchange Code Review Q#151715, answer score: 6
Revisions (0)
No revisions yet.