HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Sieve of Sundaram for Project Euler 7

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
sundaramsieveprojecteulerfor

Problem

This is a sequel to my previous question: Sieve of Sundaram for Project Euler 7: Python implementation slower than C++ and R

I am trying to implement the Sieve of Sundaram in various languages to solve Project Euler Problem #7.

I have written a solution in Haskell (to which I am still very new, but eager to learn) based loosely on the example available on this blog.

The approach is to find the upper bound for the value of the nth prime:
$$\mathrm{Bound}(n) = n \times (\log(n) + \log(\log(n))) $$

Then sieve all of the numbers up to the bound.

I would like some advice on improving my style (conciseness, clarity), and the performance of my code, which is about 30x slower than the C++ solution included in my previous question.

import Data.List

bound :: Int -> Int
bound n = floor (n' * ((log n') + log (log n')))
    where 
        n' = fromIntegral n

marked :: Int -> [Int]
marked n = sort [ i + j + 2*i*j | i  [Int] -> [Int]
removeComposites [] _          = []
removeComposites (s:ss) []     = 2*s + 1 : removeComposites ss []
removeComposites (s:ss) (c:cs)
    | s == c    = removeComposites ss cs
    | s > c     = removeComposites (s:ss) cs
    | otherwise = 2*s + 1 : removeComposites ss (c:cs)

sieveSundaram :: Int -> [Int]
sieveSundaram n = 2:(removeComposites [1..n'] (marked n'))
    where 
        n' = quot (bound n) 2

pe_007 = last (take n (sieveSundaram n))
    where n = 10001

main :: IO ()
main = do
    print pe_007

Solution

Prime sieves are really not well suited for linked lists. While sorting or merging are clever way to handle the lists in a functional context, the easiest way to improve preformance is to simply use mutable arrays, so it's more similar to the C++ version.

Here I just replaced the sorting and removeComposites with an unboxed STArray and indexing. The benifit with using STArray is that you still get a pure function, even though the array is mutable while generating. The times are now comparable with the C++ version.

import Data.Array.ST
import Data.Array.Unboxed

bound :: Int -> Int
bound n = floor (n' * (log n' + log (log n')))
    where 
        n' = fromIntegral n

marked :: Int -> [Int]
marked n =  [ i + j + 2*i*j | i  UArray Int Bool
marked' n = runSTUArray $ do
    arr  writeArray arr i False) (marked n)
    return arr

sieveSundaram :: Int -> [Int]
sieveSundaram n = 2: [2*x+1 | x <- [1..n'], marked' n' ! x]
    where 
        n' = quot (bound n) 2

pe_007 = last . take n $ sieveSundaram n
    where n = 100001

main :: IO ()
main = print pe_007


On my system, with n=200001, the times are:

Haskell, Original version: 5.930s
Haskell, Neil's version: 5.916s
Haskell, Data.IntSet: 2.853s
Haskell, Boxed STArray: 0.209s
Haskell, Unboxed STUArray: 0.055s
C++: 0.019s


It is probably possible to improve the times even more, but this takes care of most of the difference.

Update

Continuing on the theme of using the right data structure for the job, I tried using IntSets for aditional comparison. It is about twice as fast as the list based solutions, but still a lot slower than using mutable arrays (see updated table). So if you want to stay with pure haskell98, this might be an alternative. Here is my (naive) translation. I just create the set of numbers [1,n] and the set of numbers to remove (from marked) and take the difference.

import qualified Data.IntSet as IS

bound :: Int -> Int
bound n = floor $ n' * (log n' + log (log n'))
    where 
        n' = fromIntegral n

marked :: Int -> [Int]
marked n =  [ i + j + 2*i*j | i  [Int]
marked' n = let
    l1 = IS.fromList [1..n]
    l2 = IS.fromList (marked n)
    in IS.toList $ l1 IS.\\ l2

sieveSundaram :: Int -> [Int]
sieveSundaram n = 2: [2*x+1 | x <- marked' n']
    where 
        n' = quot (bound n) 2

pe_007 = last . take n $ sieveSundaram n
    where n = 200001

main :: IO ()
main = print pe_007


IntSets are a really good data structure if you need to repeatedly modify the sets in a pure context, but for this problem nothing can beat an unboxed mutable array.

Code Snippets

import Data.Array.ST
import Data.Array.Unboxed

bound :: Int -> Int
bound n = floor (n' * (log n' + log (log n')))
    where 
        n' = fromIntegral n

marked :: Int -> [Int]
marked n =  [ i + j + 2*i*j | i <- [1..iMax], j <- [i..jMax i]]
    where 
        iMax = floor (sqrt (fromIntegral n))
        jMax i = quot (n - i) (2*i + 1)

marked' :: Int -> UArray Int Bool
marked' n = runSTUArray $ do
    arr <- newArray (1,n) True
    mapM_ (\i -> writeArray arr i False) (marked n)
    return arr

sieveSundaram :: Int -> [Int]
sieveSundaram n = 2: [2*x+1 | x <- [1..n'], marked' n' ! x]
    where 
        n' = quot (bound n) 2

pe_007 = last . take n $ sieveSundaram n
    where n = 100001

main :: IO ()
main = print pe_007
import qualified Data.IntSet as IS

bound :: Int -> Int
bound n = floor $ n' * (log n' + log (log n'))
    where 
        n' = fromIntegral n

marked :: Int -> [Int]
marked n =  [ i + j + 2*i*j | i <- [1..iMax], j <- [i..jMax i]]
    where 
        iMax = floor (sqrt (fromIntegral n))
        jMax i = quot (n - i) (2*i + 1)

marked' :: Int -> [Int]
marked' n = let
    l1 = IS.fromList [1..n]
    l2 = IS.fromList (marked n)
    in IS.toList $ l1 IS.\\ l2

sieveSundaram :: Int -> [Int]
sieveSundaram n = 2: [2*x+1 | x <- marked' n']
    where 
        n' = quot (bound n) 2

pe_007 = last . take n $ sieveSundaram n
    where n = 200001

main :: IO ()
main = print pe_007

Context

StackExchange Code Review Q#80647, answer score: 4

Revisions (0)

No revisions yet.