patternpythonMinor
Turkish word filtering
Viewed 0 times
turkishwordfiltering
Problem
I need to filter Turkish words according to vovel alternation rules. For example -ler is plural but can have form of -lar thus I use metaphonemes for these transitive allomorphs as vovel_dict in my algorithm.
There is only one condition not to alternate vovel it's 'ken' which excluded. I wrote this but it takes too much runtime do you have any suggestions?
There is only one condition not to alternate vovel it's 'ken' which excluded. I wrote this but it takes too much runtime do you have any suggestions?
def char_filter(morph):
char_dict = {'d': 'D', 't': 'D', 'a': 'A', 'e': 'A', 'ı': 'H', 'i': 'H', 'u': 'H', 'C': 'C', 'g': 'G', 'k': 'G'}
res = []
res1 = []
flag = []
if 'ken' in morph:
flag.append([morph.rindex('ken'), ['k','e','n']])
for i in morph:
res2 = i
if i in char_dict:
res2 = i.upper()
res1.append(res2)
if len(flag) > 0:
for z in flag:
res1[z[0]:z[0]+3] = z[1]
res.append(string.join(res1, sep=''))
return res[0]Solution
Naming
Even
Control flow
is twice as much complicate than it should be. Once because
Also, since you’re not modifying
Data structures
Most of the data structures you use are not suited to your needs. Why storing only one string as the first item of
Also note the prefered syntax for
You’re also building a dictionary in
I will consider the second option.
The last thing to note is that this collection will not be changed between calls so we can safely define it outside of the function to avoid building a new one each time we call
A little bit on pythonic constructs
Is better written with a list-comprehension: more readable and faster.
Proposed improvements
res, res1, res2? These names convey absolutely no meaning. flag? Sounds like some sort of boolean value to me; but you assign it a list. i, z? one-letter variable names are essentially used for integer indices, not characters or… what is z already… oh, a list.Even
morph sounds badly chosen, I’m laking a bit of context here to fully understand it, I guess, but I would have used origin_string or something alike.Control flow
if len(flag) > 0:
for z in flag:
# do somethingis twice as much complicate than it should be. Once because
if len(flag) > 0 can be written if flag: empty container are considered False in a boolean context. And once again because the if is not needed: for z in flag will be a no-op if flag is an empty container.Also, since you’re not modifying
morph and the only thing you store in flag (if any) is [morph.rindex('ken'), ['k','e','n']], why not get rid of flag and directly use that list in the last part of your computation?def char_filter(morph):
char_dict = {'d': 'D', 't': 'D', 'a': 'A', 'e': 'A', 'ı': 'H', 'i': 'H', 'u': 'H', 'C': 'C', 'g': 'G', 'k': 'G'}
res = []
res1 = []
for i in morph:
res2 = i
if i in char_dict:
res2 = i.upper()
res1.append(res2)
if 'ken' in morph:
i = morph.rindex('ken')
res1[i:i+3] = ['k','e','n']
res.append(string.join(res1, sep=''))
return res[0]Data structures
Most of the data structures you use are not suited to your needs. Why storing only one string as the first item of
res if you extract it right after having assigned it. Store the string directly in res intead of res[0]. Better: do not use res and return the string right after building it, you’re not making any other use of res anyway:return ''.join(res1)Also note the prefered syntax for
join which is an operator of the separator string.You’re also building a dictionary in
char_dict but never using the values stored into it, only checking for the existence of keys. Two possibilities:- either you have a bug and need to use
res2 = char_dict[i]instead ofres2 = i.upper();
- or you just need to simplify the data structure and only store the characters that you want to test against. A list, a string or a set are better fit for this task.
I will consider the second option.
The last thing to note is that this collection will not be changed between calls so we can safely define it outside of the function to avoid building a new one each time we call
char_filter.A little bit on pythonic constructs
some_list = []
for variable in some_collection:
value = some_function(variable)
some_list.append(value)Is better written with a list-comprehension: more readable and faster.
some_list = [some_function(variable) for variable in some_collection]Proposed improvements
SPECIAL_CHARS = 'dtaeıiuCgk'
def char_filter(origin_string):
filtered_letters = [char.upper() if char in SPECIAL_CHARS else char
for char in origin_string]
if 'ken' in origin_string:
i = origin_string.rindex('ken')
filtered_letters[i:i+3] = ['k','e','n']
return ''.join(filtered_letters)Code Snippets
if len(flag) > 0:
for z in flag:
# do somethingdef char_filter(morph):
char_dict = {'d': 'D', 't': 'D', 'a': 'A', 'e': 'A', 'ı': 'H', 'i': 'H', 'u': 'H', 'C': 'C', 'g': 'G', 'k': 'G'}
res = []
res1 = []
for i in morph:
res2 = i
if i in char_dict:
res2 = i.upper()
res1.append(res2)
if 'ken' in morph:
i = morph.rindex('ken')
res1[i:i+3] = ['k','e','n']
res.append(string.join(res1, sep=''))
return res[0]return ''.join(res1)some_list = []
for variable in some_collection:
value = some_function(variable)
some_list.append(value)some_list = [some_function(variable) for variable in some_collection]Context
StackExchange Code Review Q#110708, answer score: 7
Revisions (0)
No revisions yet.