patternMinor
Pearson Correlation in Haskell
Viewed 0 times
pearsonhaskellcorrelation
Problem
As an exercise to help me learn Haskell, I decided to write a small "program" to find the Pearson Correlation Coefficient for 2 lists of numbers. I'm pretty unhappy with how it turned out, because I feel like the typing is a mess, and it is difficult to read.
I was hoping that someone with experience would help me to improve my code, and tell me how they would have approached the problem.
I was hoping that someone with experience would help me to improve my code, and tell me how they would have approached the problem.
import Data.List
summation :: (Integral a, Fractional b) => a -> a -> (a -> b) -> b
summation i n e = if (i [a] -> b
mean x = (1 / (genericLength x)) *
(summation 1 (length x) (\i -> realToFrac (x !! (i-1))))
covariance :: (Real a, Fractional b) => [a] -> [a] -> b
covariance x y = (1 / (genericLength x)) *
(summation 1 (length x) (\i -> ((realToFrac (x !! (i-1)) - mean x) * (realToFrac (y !! (i-1)) - mean x))))
stddev :: (Real a, Floating b) => [a] -> b
stddev x = ((1 / (genericLength x)) *
(summation 1 (length x) (\i -> (realToFrac (x !! (i-1)) - mean x) ** 2))) ** (1/2)
pearson :: (Real a, Floating b) => [a] -> [a] -> b
pearson x y = covariance x y / (stddev x * stddev y)Solution
Each function on its own
Let us have a look at each of your functions first in solitude. What does
Well, for all numbers from
Note that this only holds if
Next, what does
Uh, that looks complicated. First, let us rewrite it as proper fraction:
Much easier to read. So, what do you actually do in
Note that lists are usually named with a suffix
We use the same approach on
And now we immediately spot an error in
However, let us have a look at the mathematical definition again:
$$
cov(X,Y) = E\left[(X-E[X])*(Y-E[Y])\right]
$$
Allright. Let's write this exactly down as it stands there:
This is obviously not ready yet. What are
Now we're done. Note how the
Everything at once
After we've finished our rewrite, it turns out you don't need
Note that
Please keep in mind that this implementation has some performance related issues. Especially the
Working with lists
When you work with lists, avoid element-wise access. It's really slow. Instead, you want to transform the whole map into a single value (a fold, e.g.
Let us have a look at each of your functions first in solitude. What does
summation do?summation :: (Integral a, Fractional b) => a -> a -> (a -> b) -> b
summation i n e = if (i < n)
then (e i + summation (i+1) n e)
else (e i)Well, for all numbers from
x in i to n, sum the e x. We can write this a lot more clear with a list comprehension:summation :: (Enum n, Num a) => n -> n -> a
summation i n e = sum [e x | x <- [i..n]]Note that this only holds if
i is always lesser than n in the inital call.Next, what does
mean do?mean :: (Real a, Fractional b) => [a] -> b
mean x = (1 / (genericLength x)) *
(summation 1 (length x) (\i -> realToFrac (x !! (i-1))))Uh, that looks complicated. First, let us rewrite it as proper fraction:
mean x = num / denom
where
num = summation 1 (length x) (\i -> realToFrac (x !! (i-1)))
denom = genericLength xMuch easier to read. So, what do you actually do in
num? You calculate the sum of all numbers in x. However, this can be done with sum already. We end up withmean :: Fractional a => [a] -> a
mean xs = sum xs / genericLength xsNote that lists are usually named with a suffix
s. One x, multiple xs.We use the same approach on
covariance. First, we rewrite it in simpler terms:covariance x y = num / denom
where
num = summation 1 (length x) elemwise
denom = genericLength x
elemwise i = (realToFrac (x !! (i-1)) - mean x) * (realToFrac (y !! (i-1)) - mean x))And now we immediately spot an error in
elemwise. You wrote y !! (i-1) - mean x, but you meant y !! (i - 1) - mean y.However, let us have a look at the mathematical definition again:
$$
cov(X,Y) = E\left[(X-E[X])*(Y-E[Y])\right]
$$
Allright. Let's write this exactly down as it stands there:
covariance x y = mean productXY
where
productXY = pairwiseProduct xSubMean ySubMean
xSubMean = subtractMean x
ySubMean = subtractMean yThis is obviously not ready yet. What are
pairwiseProduct and subtractMean?subtractMean :: Fractional a => [a] -> [a]
subtractMean xs = [x - mx | x [a] -> [a] -> [a]
pairwiseProduct xs ys = zipWith (*) xs ysNow we're done. Note how the
covariance almost looks like pseudo-code? It's clearly easy to read by a human. That's the most important point you should take from this review: make your code easy to read for yourself.stddev xs is just sqrt (covariance xs xs), so you should probably use that:-- sqrt needs Floating
stddev :: Floating a => [a] -> a
stddev xs = sqrt (covariance xs xs)Everything at once
After we've finished our rewrite, it turns out you don't need
summation at all. So what do we end up with?mean :: Fractional a => [a] -> a
mean xs = sum xs / genericLength xs
covariance :: Fractional a => [a] -> [a] -> a
covariance xs ys = mean productXY
where
productXY = zipWith (*) [x - mx | x [a] -> a
stddev xs = sqrt (covariance xs xs)
pearson :: (Floating a) => [a] -> [a] -> a
pearson x y = covariance x y / (stddev x * stddev y)Note that
covariance got a lot easier to read due to bindings.Please keep in mind that this implementation has some performance related issues. Especially the
mean implementation is a poster child for a function that leaks memory, but for small lists you should be fine.Working with lists
When you work with lists, avoid element-wise access. It's really slow. Instead, you want to transform the whole map into a single value (a fold, e.g.
length or sum), or into another map (above with list comprehensions, e.g. [x - mx | x <- xs]).Code Snippets
summation :: (Integral a, Fractional b) => a -> a -> (a -> b) -> b
summation i n e = if (i < n)
then (e i + summation (i+1) n e)
else (e i)summation :: (Enum n, Num a) => n -> n -> a
summation i n e = sum [e x | x <- [i..n]]mean :: (Real a, Fractional b) => [a] -> b
mean x = (1 / (genericLength x)) *
(summation 1 (length x) (\i -> realToFrac (x !! (i-1))))mean x = num / denom
where
num = summation 1 (length x) (\i -> realToFrac (x !! (i-1)))
denom = genericLength xmean :: Fractional a => [a] -> a
mean xs = sum xs / genericLength xsContext
StackExchange Code Review Q#155662, answer score: 3
Revisions (0)
No revisions yet.