patternMinor

Efficient Prime Sieve - Haskell

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

primeefficientsievehaskell

Problem

I'm new to Haskell, and am trying to learn the basics with a simple function that will calculate the list of primes up to a given value (using the Sieve of Eratosthenes). Here is my code so far, which has the problem of becoming very slow for large inputs nmax:

primes :: Integer -> [Integer]
primes nmax
  | nmax  [Integer]
    sieve [] = []
    sieve [a] = [a]
    sieve (p:xs) = p : sieve [multiple*p+offset | multiple <- [1..(last xs) `quot` p], offset <-[1..(p-1)], multiple*p+offset `elem` xs]

I am aware of this resource which gives implementations of Eratosthenes' Sieve, however, their methods are beyond me as I'm only just beginning. I'm also aware of this this question, but they have used entirely different methods again. I would like to know what it is about my code that is so inefficient.

Example timings:

primes 100 takes 0.01s and uses 0 bytes of RAM.

primes 1000 takes 0.41s and uses ~38.5MB of RAM.

primes 2000 takes 3.15s and uses ~153MB of RAM.

I suspect the problem lies in checking if multiple*p+offset is an element of the list xs, but I'm not sure how to circumvent this check.

Solution

The underlying complexity

I suspect the problem lies …

Ah, but we don't like to suspect. Just as our compiler tells us that we're missing a semicolon, our profiler will tell us where we lose time. Let's add a main first:

main :: IO ()
main = readLn >>= print . length . primes

Now, it's easier to measure:

$ ghc -prof -auto-all -O2 Primes.hs
$ echo 1000 | ./Primes +RTS -s -p
168
      23,717,912 bytes allocated in the heap
       1,235,856 bytes copied during GC
         111,776 bytes maximum residency (2 sample(s))
          26,880 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        44 colls,     0 par    0.000s   0.002s     0.0000s    0.0002s
  Gen  1         2 colls,     0 par    0.000s   0.001s     0.0005s    0.0008s

  INIT    time    0.000s  (  0.002s elapsed)
  MUT     time    0.375s  (  0.372s elapsed)
  GC      time    0.000s  (  0.003s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.375s  (  0.377s elapsed)

  %GC     time       0.0%  (0.8% elapsed)

  Alloc rate    63,247,765 bytes per MUT second

  Productivity 100.0% of total user, 99.5% of total elapsed

$ echo 2000 | ./Primes +RTS -s -p
303
      84,381,928 bytes allocated in the heap
       9,612,184 bytes copied during GC
         137,048 bytes maximum residency (2 sample(s))
          28,592 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       161 colls,     0 par    0.062s   0.015s     0.0001s    0.0005s
  Gen  1         2 colls,     0 par    0.000s   0.001s     0.0003s    0.0004s

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    2.297s  (  2.399s elapsed)
  GC      time    0.062s  (  0.015s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    2.359s  (  2.416s elapsed)

  %GC     time       2.6%  (0.6% elapsed)

  Alloc rate    36,737,710 bytes per MUT second

  Productivity  97.4% of total user, 95.1% of total elapsed

Before we have a look at the profile, we already see that something is amiss. For twice the maximum, your code takes almost 7 times the time: 0.375s for numbers up to 1000, and 2.297 seconds for 2000. That indicates that your algorithm doesn't have the right complexity to begin with. Let's check 4000:

$ echo 4000 | ./Primes +RTS -s -p
550
     302,844,912 bytes allocated in the heap
      70,455,392 bytes copied during GC
         431,144 bytes maximum residency (37 sample(s))
          36,976 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       546 colls,     0 par    0.016s   0.093s     0.0002s    0.0006s
  Gen  1        37 colls,     0 par    0.031s   0.012s     0.0003s    0.0013s

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time   16.766s  ( 17.134s elapsed)
  GC      time    0.047s  (  0.105s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   16.812s  ( 17.240s elapsed)

  %GC     time       0.3%  (0.6% elapsed)

  Alloc rate    18,063,443 bytes per MUT second

  Productivity  99.7% of total user, 97.2% of total elapsed

For twice the elements, we have 8 times the time. This indicates that you have \$\mathcal O(n^3)\$ complexity.

An unhelpful profile

Let's have a look at the generated profile. In case you don't know GHC's profiling feature yet: -prof enables profiling during compilation, but you still need to tell GHC what to profile (in your code). -auto-all enables profiling of your functions without changing your code. And +RTS -p enables profiling at runtime.

This will create a .prof file. However, it's not very helpful, since most of your logic is in a list comprehension, and that's not profiled:

```
Sun Sep 11 08:24 2016 Time and Allocation Profiling Report (Final)

Primes +RTS -s -p -RTS

total time = 15.19 secs (15193 ticks @ 1000 us, 1 processor)
total alloc = 185,477,264 bytes (excludes profiling overheads)

COST CENTRE MODULE %time %alloc

primes.sieve Main 100.0 99.9

individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc

MAIN MAIN 44 0 0.0 0.0 100.0 100.0
main Main 89 0 0.0 0.0 100.0 100.0
primes Main 91 1 0.0 0

Code Snippets

main :: IO ()
main = readLn >>= print . length . primes

$ ghc -prof -auto-all -O2 Primes.hs
$ echo 1000 | ./Primes +RTS -s -p
168
      23,717,912 bytes allocated in the heap
       1,235,856 bytes copied during GC
         111,776 bytes maximum residency (2 sample(s))
          26,880 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        44 colls,     0 par    0.000s   0.002s     0.0000s    0.0002s
  Gen  1         2 colls,     0 par    0.000s   0.001s     0.0005s    0.0008s

  INIT    time    0.000s  (  0.002s elapsed)
  MUT     time    0.375s  (  0.372s elapsed)
  GC      time    0.000s  (  0.003s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.375s  (  0.377s elapsed)

  %GC     time       0.0%  (0.8% elapsed)

  Alloc rate    63,247,765 bytes per MUT second

  Productivity 100.0% of total user, 99.5% of total elapsed

$ echo 2000 | ./Primes +RTS -s -p
303
      84,381,928 bytes allocated in the heap
       9,612,184 bytes copied during GC
         137,048 bytes maximum residency (2 sample(s))
          28,592 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       161 colls,     0 par    0.062s   0.015s     0.0001s    0.0005s
  Gen  1         2 colls,     0 par    0.000s   0.001s     0.0003s    0.0004s

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    2.297s  (  2.399s elapsed)
  GC      time    0.062s  (  0.015s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    2.359s  (  2.416s elapsed)

  %GC     time       2.6%  (0.6% elapsed)

  Alloc rate    36,737,710 bytes per MUT second

  Productivity  97.4% of total user, 95.1% of total elapsed

$ echo 4000 | ./Primes +RTS -s -p
550
     302,844,912 bytes allocated in the heap
      70,455,392 bytes copied during GC
         431,144 bytes maximum residency (37 sample(s))
          36,976 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       546 colls,     0 par    0.016s   0.093s     0.0002s    0.0006s
  Gen  1        37 colls,     0 par    0.031s   0.012s     0.0003s    0.0013s

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time   16.766s  ( 17.134s elapsed)
  GC      time    0.047s  (  0.105s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   16.812s  ( 17.240s elapsed)

  %GC     time       0.3%  (0.6% elapsed)

  Alloc rate    18,063,443 bytes per MUT second

  Productivity  99.7% of total user, 97.2% of total elapsed

Sun Sep 11 08:24 2016 Time and Allocation Profiling Report  (Final)

       Primes +RTS -s -p -RTS

    total time  =       15.19 secs   (15193 ticks @ 1000 us, 1 processor)
    total alloc = 185,477,264 bytes  (excludes profiling overheads)

COST CENTRE  MODULE  %time %alloc

primes.sieve Main    100.0   99.9


                                                             individual     inherited
COST CENTRE     MODULE                     no.     entries  %time %alloc   %time %alloc

MAIN            MAIN                        44           0    0.0    0.0   100.0  100.0
 main           Main                        89           0    0.0    0.0   100.0  100.0
  primes        Main                        91           1    0.0    0.1   100.0  100.0
   primes.sieve Main                        92         550  100.0   99.9   100.0   99.9
 CAF            GHC.Read                    76           0    0.0    0.0     0.0    0.0
 CAF            GHC.IO.Encoding.CodePage    73           0    0.0    0.0     0.0    0.0
 CAF            GHC.IO.Encoding             68           0    0.0    0.0     0.0    0.0
 CAF            Text.Read.Lex               61           0    0.0    0.0     0.0    0.0
 CAF            GHC.IO.Handle.Text          57           0    0.0    0.0     0.0    0.0
 CAF            GHC.IO.Handle.FD            56           0    0.0    0.0     0.0    0.0
 CAF:main1      Main                        52           0    0.0    0.0     0.0    0.0
  main          Main                        88           1    0.0    0.0     0.0    0.0
 CAF:lvl5_r3TO  Main                        51           0    0.0    0.0     0.0    0.0
  main          Main                        90           0    0.0    0.0     0.0    0.0

sieve (p:xs) = p : sieve [multiple*p+offset 
                             | multiple <- {-# SCC multiple #-} [1..(last xs) `quot` p]
                             , offset <- {-# SCC offset #-} [1..(p-1)]
                             , {-# SCC elem #-} multiple*p+offset `elem` xs]

Context

StackExchange Code Review Q#141020, answer score: 7

Revisions (0)

No revisions yet.