patternpythonMinor
Check if strings from a list in a text and return first longest match
Viewed 0 times
checkreturntextlongestmatchfirstandlistfromstrings
Problem
I'm trying to design an application that binds a phrase or word to some item, for example image, and saves phrase-item pair in the database. Then it receives a text, and if it contains binded substring, corresponding item should be returned.
It should return only one item (first match), and longest substrings should take precedence.
I wrote a function, that returns expected values:
Example:
Is there cleaner ways to complete such task, or faster, perhaps?
It should return only one item (first match), and longest substrings should take precedence.
I wrote a function, that returns expected values:
from operator import itemgetter
def get_item(text, bindings):
text = text.lower()
matches = []
for phrase, item in bindings:
phrase = phrase.lower()
index = text.find(phrase)
if index != -1:
matches.append((phrase, item, index))
if matches:
matches.sort(key=lambda x: len(x[0]), reverse=True)
matches.sort(key=itemgetter(2))
item_id = matches[0][1]
else:
item_id = None
return item_idExample:
bindings = [
('i', 'item1'), ('like', 'item2'), ('i like', 'item3'), ('turtles', 'item4'),
]
text = 'I like turtles!'
print(get_item(text, bindings)) # should return item3Is there cleaner ways to complete such task, or faster, perhaps?
Solution
You can use an early return in your function to get rid of a few lines, because python's default return value is already
Algorithm wise, I'm not sure this code is up to the specification you wrote. You said, phrases should be preferred over single words. However, you only sort by length. Therefore
will print
Also, your code scales with something like \$\mathcal{O}(m)\mathcal{O}(nk)\$, where \$m\$ is the number of defined bindings, \$n\$ is the length of the text and \$k\$ is the average length of the phrases in the bindings. I'm not sure whether there is a better algorithm, though.
None:from operator import itemgetter
def get_item(text, bindings):
text = text.lower()
matches = []
for phrase, item in bindings:
index = text.find(phrase)
if index != -1:
matches.append((phrase, item, index))
if matches:
matches.sort(key=lambda x: len(x[0]), reverse=True)
matches.sort(key=itemgetter(2))
return matches[0][1]Algorithm wise, I'm not sure this code is up to the specification you wrote. You said, phrases should be preferred over single words. However, you only sort by length. Therefore
bindings = [('i am a', 'item1'), ('terminator', 'item2')]
text = 'I am a terminator'
print get_item(text, bindings)will print
'item2' and not 'item1', even though it is a single word and the other binding is a phrase. Maybe also use something like len(x[0].split()) to give you the number of words as the sort key?Also, your code scales with something like \$\mathcal{O}(m)\mathcal{O}(nk)\$, where \$m\$ is the number of defined bindings, \$n\$ is the length of the text and \$k\$ is the average length of the phrases in the bindings. I'm not sure whether there is a better algorithm, though.
Code Snippets
from operator import itemgetter
def get_item(text, bindings):
text = text.lower()
matches = []
for phrase, item in bindings:
index = text.find(phrase)
if index != -1:
matches.append((phrase, item, index))
if matches:
matches.sort(key=lambda x: len(x[0]), reverse=True)
matches.sort(key=itemgetter(2))
return matches[0][1]bindings = [('i am a', 'item1'), ('terminator', 'item2')]
text = 'I am a terminator'
print get_item(text, bindings)Context
StackExchange Code Review Q#140777, answer score: 2
Revisions (0)
No revisions yet.