HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Cosine simularity perfomance c#

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
simularityperfomancecosine

Problem

I am using 2 nested SortedDictionaries to construct sparse matrix

Here is the custom simularity(sim) function I wrote . Now it has O(n^2) complexity. I looking for suggestions to improve robustness and efficiency.Thanks for any help.

double 
            a = 0, b = 0,
            sqrta = 0,
            sqrtb = 0,
            sim = 0; 

        foreach (var word_i in dict)
        {
            foreach (var word_j in dict)                   
            {
                if (word_i.Key == word_j.Key) continue;

                sim=a=b=sqrta=sqrtb=0;                                   
                foreach (var term in word_j.Value.Keys)
                {
                    if (word_i.Value.ContainsKey(term))
                    {
                        word_i.Value.TryGetValue(term,out a);
                        word_j.Value.TryGetValue(term,out b);
                        sim += a * b;
                        sqrta += Math.Pow(a,2);                                
                        sqrtb += Math.Pow(b,2);
                    }

                }

                sim /= Math.Sqrt(sqrta) * Math.Sqrt(sqrtb);              

            }
        }

Solution

As you are doing calculations on all values where the result depends on both loops, there isn't much that can be done about the complexity, at least not without knowing what you do with the result (which seems to be just discarded in the code shown).

There are some things that you can do in the innermost loop:

Instead of first using Contains and then get the value, you can use TryGetValue directly.

You know that term exists in the other collection, so you don't need TryGetValue for that one.

Squaring is done a lot faster by just multiplying instead of using Math.Pow.

if (word_i.Value.TryGetValue(term,out a)) {
  b = word_j.Value[term];
  sim += a * b;
  sqrta += a * a;                                
  sqrtb += b * b;
}

Code Snippets

if (word_i.Value.TryGetValue(term,out a)) {
  b = word_j.Value[term];
  sim += a * b;
  sqrta += a * a;                                
  sqrtb += b * b;
}

Context

StackExchange Code Review Q#5357, answer score: 2

Revisions (0)

No revisions yet.