HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Project Euler problem 92 (sequences formed by the sum of squares of digits) in Haskell

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
problemthedigitssquaresprojecteulerhaskellsumsequencesformed

Problem

I made the following program in Haskell to solve Project Euler's problem 92. The program works (i.e. the solution it yields is correct), but it's not exactly the fastest in the world.

The problem is:


A number chain is created by continuously adding the square of the
digits in a number to form a new number until it has been seen before.


For example,



  • 44 → 32 → 13 → 10 → 1 → 1



  • 85 → 89 → 145 → 42 → 20 → 4 → 16 → 37 → 58 → 89





Therefore any chain that arrives at 1 or 89 will become stuck in an
endless loop. What is most amazing is that EVERY starting number will
eventually arrive at 1 or 89.


How many starting numbers below ten million will arrive at 89?

My code:

digits :: Int -> [Int]
digits 0 = []
digits x = digits (x `div` 10) ++ [x `mod` 10]

squareDigits :: [Int] -> [Int]
squareDigits x =  map (^2) x

sumSquareDigits :: Int -> Int
sumSquareDigits x = sum (squareDigits (digits x))

chainTo89 :: Int -> Bool
chainTo89 89 = True
chainTo89 1 = False
chainTo89 x = chainTo89 (sumSquareDigits x)

main :: IO()
main = print $ length $ filter (chainTo89) $ [1..9999999]


Any feedback on how this can improved, ranging from advice on good practices to very concrete algorithmic improvements, preferably without using too complicated Haskell concepts (I've only been programming in Haskell for three days) is more than welcome. If you are going to propose rather complex improvements (e.g. state monads?), then that's fine, so long as it's understandable to a beginner. :)

Note: I've implemented this in Python too, where I mapped known outcomes to a dictionary, which sped up the process considerably (i.e. the moment the chain function encounters a number already checked, it goes straight to the outcome without chaining further). However, I don't know how to do that in Haskell, if that is good practice in Haskell in the first place. This is exactly one of the reasons I asked this question in the first place; it feels as if it would

Solution

Overall, your solution looks like a reasonable Haskell implementation of the question. Well done. I found some tiny bits to comment about anyway.

Separate algorithm from IO

Part of your algorithm is specified in the main function. To make it easier to play with it, it is better to do all computations in some other function, and only handle input/output in main. In your case:

solve :: Int -> Int
solve n = length $ filter chainTo89 $ [1..n - 1]

main = print (solve 1000000)


This allows us, for example, to call solve on various numbers in ghci, in tests, or in benchmarks.

benchmarks

Since you asked about algorithmic improvements, benchmarks are actually a good idea. To use the criterion benchmarking library, we add an import and replace the main function:

import Criterion.Main

-- code to benchmark

main = defaultMain
  [ bgroup "chain89"
    [ bench ("solve " ++ show n) $ whnf solve n
    | n <- [100000, 200000 .. 1000000]
    ]
  ]


This benchmarks the solve function on values below 1 million, in 100k steps. The question asks for 10 million, but i decided to run my benchmarks on the smaller inputs to get more rapid feedback. If you compile this to an executable (say, chain89), run it as chain89 --output chain89.html to produce a nich benchmark report in chain89.html. For your code as given, I get these numbers (in ms):

Looks more or less linear to me. Did you expect it to look linear? Maybe think about it a moment before reading on.

Linear?!

I didn't expect it to be linear, because we're duplicating the work in the recursive calls. You mentioned that caching the recursive results helped a lot, so I expected the complexity without caching to be clearly worse than linear, and to regain linear complexity by caching, as you would expect from a dynamic programming solution.

After seeing the benchmark results, I realized that for large numbers n, the result of sumSquareDigits n is always much smaller than n. To approximate the result after one step, consider this: if the input is n, it has k = log n / log 10 digits. In the worst case, these digits are all 9. Then the result is k 9 9. For example, sumSquareDigits 999999 is only 486. So even if we multiply an already large input by 10, the result after the first step is not more than 81 bigger, which doesn't translate into too many additional recursive calls. There are additional recursive calls, so this is still slower than linear, but only very little, so we don't see it in the benchmark results.

Memoization for small inputs

We can memoize the result on small values, where small means that the values can occur after the first step. In other words, if we want to support inputs to n, we want to memoize the results for 1 to 81 * log n / log 10, because that's the biggest number we can see after the first step, see above.

Haskell supports an unusual approach to memoization: We can set up a lazy data structure which maps inputs to outputs, and in the initialization of each entry in the data structure, we access other entries. As long as we're never (directly or indirectly) accessing an entry from itself, the entries will be computed in an appropriate order.

Which data structure do we want here? We're going to use the consecutive numbers from 1 to (81 * log n / log 10) as inputs, so we probably want to use an array. We need to add an import, change chainTo89 to add the memoization and adapt solve to the new variant of chainTo89.

import Data.Array.IArray

-- other code here

makeChainTo89 :: Int -> (Int -> Bool)
makeChainTo89 n = compute where
  size = 81 * ceiling (log (fromIntegral n) / log 10)

  cache :: Array Int Bool
  cache = array (1, size) [(i, compute i) | i  Int
solve n = length $ filter chainTo89 $ [1 .. n - 1] where
  chainTo89 = makeChainTo89 n


I renamed chainTo89 to makeChainTo89 because makeChainTo89 n returns a function that computes the same values as the old chainTo89 when called with values that are less than or equal to n. The variable size holds the size of the memoization table and is computed as explained above. The cache is the memoization table itself. It contains an entry for all i between 1 and size, mapping i to compute i. The fetch function retrieves an entry from the memoization table, and the compute function does the actual work of computing the next element in the chain. Note that compute looks almost like the old chainTo89 except that it calls fetch instead of the calling itself recursively. But fetch will force a cache entry which contains a call to compute in its thunk, so that the recursion structure is actually the same. Just when the same cache entry is forced again, the result is already there.

This only works if n is big enough (so we never move out of the table after moving into it) and if there are really no other cycles than the cycles involving 1 and 89. Here's how the benchmark results look like for this version of

Code Snippets

solve :: Int -> Int
solve n = length $ filter chainTo89 $ [1..n - 1]

main = print (solve 1000000)
import Criterion.Main

-- code to benchmark

main = defaultMain
  [ bgroup "chain89"
    [ bench ("solve " ++ show n) $ whnf solve n
    | n <- [100000, 200000 .. 1000000]
    ]
  ]
import Data.Array.IArray

-- other code here

makeChainTo89 :: Int -> (Int -> Bool)
makeChainTo89 n = compute where
  size = 81 * ceiling (log (fromIntegral n) / log 10)

  cache :: Array Int Bool
  cache = array (1, size) [(i, compute i) | i <- [1 .. size]]

  fetch x = cache ! x

  compute 89 = True
  compute 1 = False
  compute x = fetch (sumSquareDigits x)

solve :: Int -> Int
solve n = length $ filter chainTo89 $ [1 .. n - 1] where
  chainTo89 = makeChainTo89 n
digits :: Int -> [Int]
digits 0 = []
digits x = x `mod` 10 : digits (x `div` 10)

Context

StackExchange Code Review Q#97399, answer score: 6

Revisions (0)

No revisions yet.