patternMinor
Refactor jaccard similarity the "Scala way"
Viewed 0 times
jaccardthewayscalasimilarityrefactor
Problem
I'm trying to pick Scala up. This is a simple heuristic that checks a similarity value between two sets. I've done this a million times in Java or Python. The function works, but I'm certain I am not doing this "the Scala way".
val a : List[Int] = List(1,1,2,2,3)
val b : List[Int] = List(1,2,3,4,5)
def calculateBags(a: List[Int], b: List[Int]): Double = {
val a_counts = a.groupBy(x=>x).map(y=> y._1->y._2.size)
val b_counts = b.groupBy(x=>x).map(y=> y._1->y._2.size)
var count = 0
a_counts.foreach{ x=>
var k = x._1
var t = x._2
if (b_counts.contains(k)) {
count+= Math.min(t,b_counts(k))
}
}
(count.toDouble)/(a.size+b.size)
}Solution
Idiomatic Scala Style
When defining a
When using an infix operator add a space on both sides of it.
Code
Here is your code with all of the changes I would make included (explanation follows):
-
The first thing I did was define a function literal,
-
Next, I've defined a function which for lack of a better name I've called
-
Lastly we utilize a
Anyway, hope this helps!
When defining a
val, var, or a function use camelCase. // :)
val aCounts = ...
// :(
val a_counts = ...When using an infix operator add a space on both sides of it.
// :)
x => x
i -> j
foo * boo
// :(
x=>x
i->j
foo*booCode
Here is your code with all of the changes I would make included (explanation follows):
import Math.min
def calculateBags(a: List[Int], b: List[Int]): Double = {
val countsByElem =
(xs: List[Int]) =>
xs.groupBy(elem => elem).map { case (e, xss) => e -> xss.length }
val aCounts = countsByElem(a)
val bCounts = countsByElem(b)
def addMin(count: Int, x: (Int, Int)): Int = {
val (k, t) = x
val v = bCounts getOrElse(k, 0)
count + min(v, t)
}
val len = (a.length + b.length).toDouble
(0 /: aCounts)(addMin) / len
}-
The first thing I did was define a function literal,
countsByElem. I mostly did this to show you that such expressions are possible and sometimes very useful. The advantage in this case is that instead of duplicating the same logic in multiple places, we just define a function and call it as necessary. If we then need to make a change to the logic we can do so in one place. Another advantage is that a future reader of your code can instantly tell aCounts and bCounts are calculated with the same set of operations (though with different lists). A disadvantage is that defining this function literal adds more lines of code to your function overall. But we will make up for that later :)-
Next, I've defined a function which for lack of a better name I've called
addMin. The parameters have the same names as you used in your original code: count is the accumulator and x is the ith element of aCounts. The notable features of this function are:- In the first line we unpack the tuple value
xwith some syntactic sugar
- The next line (starting
val v = ...) attempts to retrieve a value from thebCountsmap based on the keyk. If there is no value associated withkinbCountsthe integer value0is returned instead.
-
Lastly we utilize a
foldLeft which in the code is designated by the /:. It is a bit hard to concisely explain the set of fold methods that Scala offers. But a nice tutorial on them can be found here. Brief explanation: we pass each element in aCounts along with an accumulator to addMin. If we were to 'unroll' some arbitrary fold-left it might look something like this:// Scala-esque psuedocode to explain a fold ...
val foldResult = {
val res0 = func(0, xs(0))
val res1 = func(res0, xs(1))
val res2 = func(res1, xs(2))
// ...
func(penultimateResult, xs.last)
}Anyway, hope this helps!
Code Snippets
// :)
val aCounts = ...
// :(
val a_counts = ...// :)
x => x
i -> j
foo * boo
// :(
x=>x
i->j
foo*booimport Math.min
def calculateBags(a: List[Int], b: List[Int]): Double = {
val countsByElem =
(xs: List[Int]) =>
xs.groupBy(elem => elem).map { case (e, xss) => e -> xss.length }
val aCounts = countsByElem(a)
val bCounts = countsByElem(b)
def addMin(count: Int, x: (Int, Int)): Int = {
val (k, t) = x
val v = bCounts getOrElse(k, 0)
count + min(v, t)
}
val len = (a.length + b.length).toDouble
(0 /: aCounts)(addMin) / len
}// Scala-esque psuedocode to explain a fold ...
val foldResult = {
val res0 = func(0, xs(0))
val res1 = func(res0, xs(1))
val res2 = func(res1, xs(2))
// ...
func(penultimateResult, xs.last)
}Context
StackExchange Code Review Q#75751, answer score: 4
Revisions (0)
No revisions yet.