HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Matching two people. One has 7% in common with the other. The other has 70% in common. What's a fair match score?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
thefairwhatwithmatchscorehasonetwopeople

Problem

I am exploring some ideas around dating site matching, and I can do with some ideas as to which is the better solution

Senario: We have a boy and girl, the boy has completed his profile/bio thoroughly while the girl hasn't or maybe she just doesn't like a lot of things, so this is what the data looks like

Boy likes:                  Girl likes:
---------                   -----------
Painting                    Painting
Movies                      Movies
Long walks                  Long walks  
Harry Potter                Harry Potter
Dogs                        Dogs
Pizza                       Pizza
Computer Science            Computer Science
Mercedes                    Ford
Traveling                   Daisies
Diving                      Psycology
...

Total things
---------------------------------------
100                         10


They share 7 things in common. To the girl the boy is a 70% match which is great
But to the boy the girl is only a 7% match

Maybe the girl would have more in common if she had more in her list of likes so maybe giving their match a 7/100 is a match missed and maybe that's all she likes and 7/100 is the best they'll ever match but even then, it still might be a good match because maybe the boy over filled his list of things he likes while the girl chose carefully what she really likes.

I am looking for ideas for an algorithm that would give the best match score to order them in lists with other people/matches who might have small percentage in matches because their lists of things they like are 1000s of entries long for example

Edit: I will not be matching strings, each of strings in the table above will have a unique ID in an RDBMS database. A table will be used to link users and topics

Solution

It seems you're looking for a symmetric set similarity measure. (Symmetric since, as you point out, $A$ should match $B$ as much as $B$ matches $A$. Set similarity since each person's preference is defined by a set of objects.)

A number of these are used in the CS literature. Probably the most common is Jaccard similarity, defined by $|A\cap B|/|A\cup B|$. In your example, $A$ and $B$ have 7 elements in common, and there are 103 elements between them, leading to a similarity of $7/103$. A similar metric is Braun-Blanquet similarity, $|A\cap B|/\max\{|A|,|B|\}$; in your example this gives $7/100$. With these as starting points you may be able to find further metrics that capture what you want better. For example, here's a list I found while writing this.

What's notable about these metrics is that they don't take each person's dislikes into account---the similarity is defined only by what's on their list, not what's not on it. (So you may think it makes a better match if two people don't like fish.) Incorporating this would require a universe of elements (so any element in the universe not on the list is considered a dislike). In this case you can use the classic Hamming Distance. You have other options as well, like Euclidean distance or even the Sørensen-Dice similarity.

One benefit of using one of the above, well-known similarity measures is that you can use known algorithms to compute nearest neighbors quickly.

Context

StackExchange Computer Science Q#73709, answer score: 4

Revisions (0)

No revisions yet.