HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

A reference for pseudocode for Monge-Elkan algorithm?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
referencepseudocodeelkanalgorithmformonge

Problem

Does anyone have a good reference to pseudocode for Monge-Elkan string comparison algorithm?

I have access to the two original papers, but they do not show the pseudocode of the actual algorithm. Also, I have seen some implementations in Java (preference), but they are part of the larger package and with a complex inheritance and composition hierarchies.

I was wondering if someone can point me to a pseudocode for the algorithm, so that I could implement it myself.

Solution

Algorithm description

The input string are broken into tokens. The best matching token are compared to get the monge-elkan score.

Ex:

Input string 1: "paul johnson"

Input string 2 : "johson paule"

Score : 0.94

The algorithm uses similarity function (Example : Jaro-Winkler or Levenshtein score) as inner function.
The inner function is used to compute the scores of the best matching token.

Ex: jaro_winkler("paul","johson") = 0

jaro_winkler("paul","paule") = 0.96

jaro_winkler("johnson","paule") = 0.0

jaro_winkler("johnson","johson") = 0.92

Monge_elkan = final_score = 1/2*(0.92+0.96) = 0.94

//Python pseudocode
cummax = 0
for ws in s.split(" "):
  maxscore=0
  for wt in t.split(" "):
    maxscore = max(maxscore,j.jaro_winkler(ws,wt))  
  cummax += maxscore
score = cummax/len(s.split(" "))


You can check the implementation in secondstring and in sopremo . The implementation in sopremo is ported to Apache Flink as well.

Code Snippets

//Python pseudocode
cummax = 0
for ws in s.split(" "):
  maxscore=0
  for wt in t.split(" "):
    maxscore = max(maxscore,j.jaro_winkler(ws,wt))  
  cummax += maxscore
score = cummax/len(s.split(" "))

Context

StackExchange Computer Science Q#32530, answer score: 5

Revisions (0)

No revisions yet.