patterncsharpMinor
Cosine simularity perfomance c#
Viewed 0 times
simularityperfomancecosine
Problem
I am using 2 nested SortedDictionaries to construct sparse matrix
Here is the custom simularity(sim) function I wrote . Now it has O(n^2) complexity. I looking for suggestions to improve robustness and efficiency.Thanks for any help.
Here is the custom simularity(sim) function I wrote . Now it has O(n^2) complexity. I looking for suggestions to improve robustness and efficiency.Thanks for any help.
double
a = 0, b = 0,
sqrta = 0,
sqrtb = 0,
sim = 0;
foreach (var word_i in dict)
{
foreach (var word_j in dict)
{
if (word_i.Key == word_j.Key) continue;
sim=a=b=sqrta=sqrtb=0;
foreach (var term in word_j.Value.Keys)
{
if (word_i.Value.ContainsKey(term))
{
word_i.Value.TryGetValue(term,out a);
word_j.Value.TryGetValue(term,out b);
sim += a * b;
sqrta += Math.Pow(a,2);
sqrtb += Math.Pow(b,2);
}
}
sim /= Math.Sqrt(sqrta) * Math.Sqrt(sqrtb);
}
}Solution
As you are doing calculations on all values where the result depends on both loops, there isn't much that can be done about the complexity, at least not without knowing what you do with the result (which seems to be just discarded in the code shown).
There are some things that you can do in the innermost loop:
Instead of first using
You know that
Squaring is done a lot faster by just multiplying instead of using
There are some things that you can do in the innermost loop:
Instead of first using
Contains and then get the value, you can use TryGetValue directly.You know that
term exists in the other collection, so you don't need TryGetValue for that one.Squaring is done a lot faster by just multiplying instead of using
Math.Pow.if (word_i.Value.TryGetValue(term,out a)) {
b = word_j.Value[term];
sim += a * b;
sqrta += a * a;
sqrtb += b * b;
}Code Snippets
if (word_i.Value.TryGetValue(term,out a)) {
b = word_j.Value[term];
sim += a * b;
sqrta += a * a;
sqrtb += b * b;
}Context
StackExchange Code Review Q#5357, answer score: 2
Revisions (0)
No revisions yet.