HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Refactor jaccard similarity the "Scala way"

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
jaccardthewayscalasimilarityrefactor

Problem

I'm trying to pick Scala up. This is a simple heuristic that checks a similarity value between two sets. I've done this a million times in Java or Python. The function works, but I'm certain I am not doing this "the Scala way".

val a : List[Int] = List(1,1,2,2,3)
val b : List[Int] = List(1,2,3,4,5)

def calculateBags(a: List[Int], b: List[Int]): Double = {
    val a_counts = a.groupBy(x=>x).map(y=> y._1->y._2.size)
    val b_counts = b.groupBy(x=>x).map(y=> y._1->y._2.size)
    var count = 0
    a_counts.foreach{ x=>
        var k = x._1
        var t = x._2
        if (b_counts.contains(k)) {
            count+= Math.min(t,b_counts(k))
        }
    }
    (count.toDouble)/(a.size+b.size)
}

Solution

Idiomatic Scala Style

When defining a val, var, or a function use camelCase.

// :)
    val aCounts = ... 

    // :(
    val a_counts = ...


When using an infix operator add a space on both sides of it.

// :)
    x => x 
    i -> j
    foo * boo

    // :(
    x=>x
    i->j
    foo*boo


Code

Here is your code with all of the changes I would make included (explanation follows):

import Math.min

    def calculateBags(a: List[Int], b: List[Int]): Double = {
        val countsByElem =
            (xs: List[Int]) =>
                xs.groupBy(elem => elem).map { case (e, xss) => e -> xss.length }

        val aCounts = countsByElem(a)
        val bCounts = countsByElem(b)

        def addMin(count: Int, x: (Int, Int)): Int = {
            val (k, t) = x
            val v = bCounts getOrElse(k, 0)
            count + min(v, t)
        }

        val len = (a.length + b.length).toDouble

        (0 /: aCounts)(addMin) / len
    }


-
The first thing I did was define a function literal, countsByElem. I mostly did this to show you that such expressions are possible and sometimes very useful. The advantage in this case is that instead of duplicating the same logic in multiple places, we just define a function and call it as necessary. If we then need to make a change to the logic we can do so in one place. Another advantage is that a future reader of your code can instantly tell aCounts and bCounts are calculated with the same set of operations (though with different lists). A disadvantage is that defining this function literal adds more lines of code to your function overall. But we will make up for that later :)

-
Next, I've defined a function which for lack of a better name I've called addMin. The parameters have the same names as you used in your original code: count is the accumulator and x is the ith element of aCounts. The notable features of this function are:

  • In the first line we unpack the tuple value x with some syntactic sugar



  • The next line (starting val v = ...) attempts to retrieve a value from the bCounts map based on the key k. If there is no value associated with k in bCounts the integer value 0 is returned instead.



-
Lastly we utilize a foldLeft which in the code is designated by the /:. It is a bit hard to concisely explain the set of fold methods that Scala offers. But a nice tutorial on them can be found here. Brief explanation: we pass each element in aCounts along with an accumulator to addMin. If we were to 'unroll' some arbitrary fold-left it might look something like this:

// Scala-esque psuedocode to explain a fold ...
val foldResult = {
    val res0 = func(0,    xs(0))
    val res1 = func(res0, xs(1))
    val res2 = func(res1, xs(2))
    // ...
    func(penultimateResult, xs.last)
}


Anyway, hope this helps!

Code Snippets

// :)
    val aCounts = ... 

    // :(
    val a_counts = ...
// :)
    x => x 
    i -> j
    foo * boo

    // :(
    x=>x
    i->j
    foo*boo
import Math.min

    def calculateBags(a: List[Int], b: List[Int]): Double = {
        val countsByElem =
            (xs: List[Int]) =>
                xs.groupBy(elem => elem).map { case (e, xss) => e -> xss.length }

        val aCounts = countsByElem(a)
        val bCounts = countsByElem(b)

        def addMin(count: Int, x: (Int, Int)): Int = {
            val (k, t) = x
            val v = bCounts getOrElse(k, 0)
            count + min(v, t)
        }

        val len = (a.length + b.length).toDouble

        (0 /: aCounts)(addMin) / len
    }
// Scala-esque psuedocode to explain a fold ...
val foldResult = {
    val res0 = func(0,    xs(0))
    val res1 = func(res0, xs(1))
    val res2 = func(res1, xs(2))
    // ...
    func(penultimateResult, xs.last)
}

Context

StackExchange Code Review Q#75751, answer score: 4

Revisions (0)

No revisions yet.