HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Memoized Collatz sequence

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
sequencememoizedcollatz

Problem

Here is one of my programs that utilized memoization and array to improve performance and memory usage. The performance seems satisfactory but the memory usage is ridiculous and I can't figure out what's wrong:

{-# LANGUAGE BangPatterns #-}
import Data.Functor
import Data.Array (Array)
import qualified Data.Array as Arr
import Control.DeepSeq

genColtzArr n = collatzArr
    where collatzArr = Arr.array (1, n) $ take n $ map (\v -> (v, collatz v 0)) [1..] 
          collatz 1 !acc  = 1 + acc
          collatz !m !acc
              | even m    = go (m `div` 2) acc
              | otherwise = go (3 * m + 1) acc
              where go !l !acc
                      | l <= n    = let !v = collatzArr Arr.! l in 1 + acc + v
                      | otherwise = collatz l $ 1 + acc


collatz here means this guy. This function is supposed to receive a number n, and then return an array indexing from 1 to n, and in which each cell contains the length of the link from the index to 1 by applying Collatz formula.

But the memory usage of this method is so high. Here is the profiler result (ghc option -prof -fprof-auto -rtsopts, run time option +RTS -p, n == 500000):

```
total alloc = 730,636,136 bytes (excludes profiling overheads)

COST CENTRE MODULE %time %alloc

genColtzArr.collatz Main 40.4 34.7
genColtzArr.collatz.go Main 25.5 14.4

COST CENTRE MODULE no. entries %time %alloc %time %alloc

genColtzArr Main 105 1 0.0 0.0 74.7 72.1
genColtzArr.collatzArr Main 106 1 8.0 20.8 74.7 72.1
genColtzArr.collatzArr.\ Main 107 500000 0.9 2.2 66.8 51.3
genColtzArr.collatz Main 109 1182582 40.4 34.7 65.9 49.1
genColtzArr.collatz.go Main 110 1182581

Solution

It doesn't use 300 megabytes of heap, it peaks at a little over 20 megabytes. Total allocation is not peak allocation and Haskell has cheap short-lived allocations so total alloc isn't always a good heuristic for GC time or steady-state heap usage. The heap profiling stuff is giving data designed for tuning code rather than for analytics and total alloc is often more helpful when comparing code for overall memory usage.

Here's a screenshot from a memory profile:

I don't think it necessarily bit you here, but in future, add explicit types to functions you are benchmarking. See here for an example of why you want to do that.

Here's the command I wrote to profile your code:

$ rm -f collatz
$ stack ghc -- -prof -fprof-auto -rtsopts -O2 collatz.hs -o collatz 
$ ./collatz +RTS -hc -p
$ hp2ps collatz.hp
$ evince collatz.ps


Last bits are just converting the hp heap profiling data to a postscript file and then I'm opening it in my PDF reader.

Code Snippets

$ rm -f collatz
$ stack ghc -- -prof -fprof-auto -rtsopts -O2 collatz.hs -o collatz 
$ ./collatz +RTS -hc -p
$ hp2ps collatz.hp
$ evince collatz.ps

Context

StackExchange Code Review Q#122239, answer score: 6

Revisions (0)

No revisions yet.