patternpythonMinor
String replacement using dictionaries
Viewed 0 times
dictionariesstringusingreplacement
Problem
I've always been bothered by the fact that there weren't any built-in functions that could replace multiple substrings of a string in Python, so I created this function. Essentially, you supply it with a string, a dictionary of keys (substrings) and values (replacements), and then a few additional options.
Here's some example usage:
def keymap_replace(
string: str,
mappings: dict,
lower_keys=False,
lower_values=False,
lower_string=False,
) -> str:
"""Replace parts of a string based on a dictionary.
This function takes a string a dictionary of
replacement mappings. For example, if I supplied
the string "Hello world.", and the mappings
{"H": "J", ".": "!"}, it would return "Jello world!".
Keyword arguments:
string -- The string to replace characters in.
mappings -- A dictionary of replacement mappings.
lower_keys -- Whether or not to lower the keys in mappings.
lower_values -- Whether or not to lower the values in mappings.
lower_string -- Whether or not to lower the input string.
"""
replaced_string = string.lower() if lower_string else string
for character, replacement in mappings.items():
replaced_string = replaced_string.replace(
character.lower() if lower_keys else character,
replacement.lower() if lower_values else replacement
)
return replaced_stringHere's some example usage:
print(keymap_replace(
"Hello person. How is your day?",
{
"hello": "goodbye",
"how": "what",
"your": "that",
"day": ""
},
lower_keys=False,
lower_values=False,
lower_string=True
))Solution
Mostly this looks good, but I’ll make a few suggestions:
Boolean arguments in your function signature
These control whether certain strings/mappings are lowercased first. I think this behaviour should be removed from
(And you could continue to add options like this: what about uppercasing the string, or reversing it, or another pattern. Rather than muddling up your function with this, I’d just take it out.)
That makes the arguments simpler, and simplifies the function definition as well.
But if you really want those parameters in there, then I’d suggest making them keyword-only. In Python 3, you can insert an asterisk as so:
and when the function is called, those boolean arguments have to be as keyword arguments, not positional. There can never be ambiguity in how the function is being used.
Dictionaries are unordered
There’s a possibility for ambiguity, or for substitutions to be chained. For example:
Should this call return "Jello world" or "Kello world"?
I think the first is more natural – I don’t expect substitutions to be chained – but when I tried the function, I got the latter. You should try to make this less confusing.
One possibility that springs to mind:
Each character in the original string is subject to at most one substitution.
Boolean arguments in your function signature
These control whether certain strings/mappings are lowercased first. I think this behaviour should be removed from
keymap_replace(). It's scope creep: it’s gone from being string replacements to more complicated string manipulations. I’d leave that sort of manipulation to the caller.(And you could continue to add options like this: what about uppercasing the string, or reversing it, or another pattern. Rather than muddling up your function with this, I’d just take it out.)
That makes the arguments simpler, and simplifies the function definition as well.
But if you really want those parameters in there, then I’d suggest making them keyword-only. In Python 3, you can insert an asterisk as so:
def keymap_replace(
string: str,
mappings: dict,
*,
lower_keys=False,
lower_values=False,
lower_string=False,
) -> str:and when the function is called, those boolean arguments have to be as keyword arguments, not positional. There can never be ambiguity in how the function is being used.
Dictionaries are unordered
There’s a possibility for ambiguity, or for substitutions to be chained. For example:
keymap_replace('Hello world', {
'J': 'K',
'H': 'J'
})
Should this call return "Jello world" or "Kello world"?
I think the first is more natural – I don’t expect substitutions to be chained – but when I tried the function, I got the latter. You should try to make this less confusing.
One possibility that springs to mind:
replaced_string = ''.join(mappings.get(char, char) for char in string)Each character in the original string is subject to at most one substitution.
Code Snippets
def keymap_replace(
string: str,
mappings: dict,
*,
lower_keys=False,
lower_values=False,
lower_string=False,
) -> str:replaced_string = ''.join(mappings.get(char, char) for char in string)Context
StackExchange Code Review Q#97318, answer score: 9
Revisions (0)
No revisions yet.