patternMinor
A reference for pseudocode for Monge-Elkan algorithm?
Viewed 0 times
referencepseudocodeelkanalgorithmformonge
Problem
Does anyone have a good reference to pseudocode for Monge-Elkan string comparison algorithm?
I have access to the two original papers, but they do not show the pseudocode of the actual algorithm. Also, I have seen some implementations in Java (preference), but they are part of the larger package and with a complex inheritance and composition hierarchies.
I was wondering if someone can point me to a pseudocode for the algorithm, so that I could implement it myself.
I have access to the two original papers, but they do not show the pseudocode of the actual algorithm. Also, I have seen some implementations in Java (preference), but they are part of the larger package and with a complex inheritance and composition hierarchies.
I was wondering if someone can point me to a pseudocode for the algorithm, so that I could implement it myself.
Solution
Algorithm description
The input string are broken into tokens. The best matching token are compared to get the monge-elkan score.
Ex:
Input string 1: "paul johnson"
Input string 2 : "johson paule"
Score : 0.94
The algorithm uses similarity function (Example : Jaro-Winkler or Levenshtein score) as inner function.
The inner function is used to compute the scores of the best matching token.
Ex: jaro_winkler("paul","johson") = 0
jaro_winkler("paul","paule") = 0.96
jaro_winkler("johnson","paule") = 0.0
jaro_winkler("johnson","johson") = 0.92
Monge_elkan = final_score = 1/2*(0.92+0.96) = 0.94
You can check the implementation in secondstring and in sopremo . The implementation in sopremo is ported to Apache Flink as well.
The input string are broken into tokens. The best matching token are compared to get the monge-elkan score.
Ex:
Input string 1: "paul johnson"
Input string 2 : "johson paule"
Score : 0.94
The algorithm uses similarity function (Example : Jaro-Winkler or Levenshtein score) as inner function.
The inner function is used to compute the scores of the best matching token.
Ex: jaro_winkler("paul","johson") = 0
jaro_winkler("paul","paule") = 0.96
jaro_winkler("johnson","paule") = 0.0
jaro_winkler("johnson","johson") = 0.92
Monge_elkan = final_score = 1/2*(0.92+0.96) = 0.94
//Python pseudocode
cummax = 0
for ws in s.split(" "):
maxscore=0
for wt in t.split(" "):
maxscore = max(maxscore,j.jaro_winkler(ws,wt))
cummax += maxscore
score = cummax/len(s.split(" "))You can check the implementation in secondstring and in sopremo . The implementation in sopremo is ported to Apache Flink as well.
Code Snippets
//Python pseudocode
cummax = 0
for ws in s.split(" "):
maxscore=0
for wt in t.split(" "):
maxscore = max(maxscore,j.jaro_winkler(ws,wt))
cummax += maxscore
score = cummax/len(s.split(" "))Context
StackExchange Computer Science Q#32530, answer score: 5
Revisions (0)
No revisions yet.