patternMinor
Project Euler problem 92 (sequences formed by the sum of squares of digits) in Haskell
Viewed 0 times
problemthedigitssquaresprojecteulerhaskellsumsequencesformed
Problem
I made the following program in Haskell to solve Project Euler's problem 92. The program works (i.e. the solution it yields is correct), but it's not exactly the fastest in the world.
The problem is:
A number chain is created by continuously adding the square of the
digits in a number to form a new number until it has been seen before.
For example,
Therefore any chain that arrives at 1 or 89 will become stuck in an
endless loop. What is most amazing is that EVERY starting number will
eventually arrive at 1 or 89.
How many starting numbers below ten million will arrive at 89?
My code:
Any feedback on how this can improved, ranging from advice on good practices to very concrete algorithmic improvements, preferably without using too complicated Haskell concepts (I've only been programming in Haskell for three days) is more than welcome. If you are going to propose rather complex improvements (e.g. state monads?), then that's fine, so long as it's understandable to a beginner. :)
Note: I've implemented this in Python too, where I mapped known outcomes to a dictionary, which sped up the process considerably (i.e. the moment the chain function encounters a number already checked, it goes straight to the outcome without chaining further). However, I don't know how to do that in Haskell, if that is good practice in Haskell in the first place. This is exactly one of the reasons I asked this question in the first place; it feels as if it would
The problem is:
A number chain is created by continuously adding the square of the
digits in a number to form a new number until it has been seen before.
For example,
- 44 → 32 → 13 → 10 → 1 → 1
- 85 → 89 → 145 → 42 → 20 → 4 → 16 → 37 → 58 → 89
Therefore any chain that arrives at 1 or 89 will become stuck in an
endless loop. What is most amazing is that EVERY starting number will
eventually arrive at 1 or 89.
How many starting numbers below ten million will arrive at 89?
My code:
digits :: Int -> [Int]
digits 0 = []
digits x = digits (x `div` 10) ++ [x `mod` 10]
squareDigits :: [Int] -> [Int]
squareDigits x = map (^2) x
sumSquareDigits :: Int -> Int
sumSquareDigits x = sum (squareDigits (digits x))
chainTo89 :: Int -> Bool
chainTo89 89 = True
chainTo89 1 = False
chainTo89 x = chainTo89 (sumSquareDigits x)
main :: IO()
main = print $ length $ filter (chainTo89) $ [1..9999999]Any feedback on how this can improved, ranging from advice on good practices to very concrete algorithmic improvements, preferably without using too complicated Haskell concepts (I've only been programming in Haskell for three days) is more than welcome. If you are going to propose rather complex improvements (e.g. state monads?), then that's fine, so long as it's understandable to a beginner. :)
Note: I've implemented this in Python too, where I mapped known outcomes to a dictionary, which sped up the process considerably (i.e. the moment the chain function encounters a number already checked, it goes straight to the outcome without chaining further). However, I don't know how to do that in Haskell, if that is good practice in Haskell in the first place. This is exactly one of the reasons I asked this question in the first place; it feels as if it would
Solution
Overall, your solution looks like a reasonable Haskell implementation of the question. Well done. I found some tiny bits to comment about anyway.
Separate algorithm from IO
Part of your algorithm is specified in the
This allows us, for example, to call
benchmarks
Since you asked about algorithmic improvements, benchmarks are actually a good idea. To use the criterion benchmarking library, we add an import and replace the main function:
This benchmarks the
Looks more or less linear to me. Did you expect it to look linear? Maybe think about it a moment before reading on.
Linear?!
I didn't expect it to be linear, because we're duplicating the work in the recursive calls. You mentioned that caching the recursive results helped a lot, so I expected the complexity without caching to be clearly worse than linear, and to regain linear complexity by caching, as you would expect from a dynamic programming solution.
After seeing the benchmark results, I realized that for large numbers
Memoization for small inputs
We can memoize the result on small values, where small means that the values can occur after the first step. In other words, if we want to support inputs to n, we want to memoize the results for 1 to 81 * log n / log 10, because that's the biggest number we can see after the first step, see above.
Haskell supports an unusual approach to memoization: We can set up a lazy data structure which maps inputs to outputs, and in the initialization of each entry in the data structure, we access other entries. As long as we're never (directly or indirectly) accessing an entry from itself, the entries will be computed in an appropriate order.
Which data structure do we want here? We're going to use the consecutive numbers from 1 to (81 * log n / log 10) as inputs, so we probably want to use an array. We need to add an import, change
I renamed
This only works if n is big enough (so we never move out of the table after moving into it) and if there are really no other cycles than the cycles involving 1 and 89. Here's how the benchmark results look like for this version of
Separate algorithm from IO
Part of your algorithm is specified in the
main function. To make it easier to play with it, it is better to do all computations in some other function, and only handle input/output in main. In your case:solve :: Int -> Int
solve n = length $ filter chainTo89 $ [1..n - 1]
main = print (solve 1000000)This allows us, for example, to call
solve on various numbers in ghci, in tests, or in benchmarks.benchmarks
Since you asked about algorithmic improvements, benchmarks are actually a good idea. To use the criterion benchmarking library, we add an import and replace the main function:
import Criterion.Main
-- code to benchmark
main = defaultMain
[ bgroup "chain89"
[ bench ("solve " ++ show n) $ whnf solve n
| n <- [100000, 200000 .. 1000000]
]
]This benchmarks the
solve function on values below 1 million, in 100k steps. The question asks for 10 million, but i decided to run my benchmarks on the smaller inputs to get more rapid feedback. If you compile this to an executable (say, chain89), run it as chain89 --output chain89.html to produce a nich benchmark report in chain89.html. For your code as given, I get these numbers (in ms):Looks more or less linear to me. Did you expect it to look linear? Maybe think about it a moment before reading on.
Linear?!
I didn't expect it to be linear, because we're duplicating the work in the recursive calls. You mentioned that caching the recursive results helped a lot, so I expected the complexity without caching to be clearly worse than linear, and to regain linear complexity by caching, as you would expect from a dynamic programming solution.
After seeing the benchmark results, I realized that for large numbers
n, the result of sumSquareDigits n is always much smaller than n. To approximate the result after one step, consider this: if the input is n, it has k = log n / log 10 digits. In the worst case, these digits are all 9. Then the result is k 9 9. For example, sumSquareDigits 999999 is only 486. So even if we multiply an already large input by 10, the result after the first step is not more than 81 bigger, which doesn't translate into too many additional recursive calls. There are additional recursive calls, so this is still slower than linear, but only very little, so we don't see it in the benchmark results.Memoization for small inputs
We can memoize the result on small values, where small means that the values can occur after the first step. In other words, if we want to support inputs to n, we want to memoize the results for 1 to 81 * log n / log 10, because that's the biggest number we can see after the first step, see above.
Haskell supports an unusual approach to memoization: We can set up a lazy data structure which maps inputs to outputs, and in the initialization of each entry in the data structure, we access other entries. As long as we're never (directly or indirectly) accessing an entry from itself, the entries will be computed in an appropriate order.
Which data structure do we want here? We're going to use the consecutive numbers from 1 to (81 * log n / log 10) as inputs, so we probably want to use an array. We need to add an import, change
chainTo89 to add the memoization and adapt solve to the new variant of chainTo89.import Data.Array.IArray
-- other code here
makeChainTo89 :: Int -> (Int -> Bool)
makeChainTo89 n = compute where
size = 81 * ceiling (log (fromIntegral n) / log 10)
cache :: Array Int Bool
cache = array (1, size) [(i, compute i) | i Int
solve n = length $ filter chainTo89 $ [1 .. n - 1] where
chainTo89 = makeChainTo89 nI renamed
chainTo89 to makeChainTo89 because makeChainTo89 n returns a function that computes the same values as the old chainTo89 when called with values that are less than or equal to n. The variable size holds the size of the memoization table and is computed as explained above. The cache is the memoization table itself. It contains an entry for all i between 1 and size, mapping i to compute i. The fetch function retrieves an entry from the memoization table, and the compute function does the actual work of computing the next element in the chain. Note that compute looks almost like the old chainTo89 except that it calls fetch instead of the calling itself recursively. But fetch will force a cache entry which contains a call to compute in its thunk, so that the recursion structure is actually the same. Just when the same cache entry is forced again, the result is already there.This only works if n is big enough (so we never move out of the table after moving into it) and if there are really no other cycles than the cycles involving 1 and 89. Here's how the benchmark results look like for this version of
Code Snippets
solve :: Int -> Int
solve n = length $ filter chainTo89 $ [1..n - 1]
main = print (solve 1000000)import Criterion.Main
-- code to benchmark
main = defaultMain
[ bgroup "chain89"
[ bench ("solve " ++ show n) $ whnf solve n
| n <- [100000, 200000 .. 1000000]
]
]import Data.Array.IArray
-- other code here
makeChainTo89 :: Int -> (Int -> Bool)
makeChainTo89 n = compute where
size = 81 * ceiling (log (fromIntegral n) / log 10)
cache :: Array Int Bool
cache = array (1, size) [(i, compute i) | i <- [1 .. size]]
fetch x = cache ! x
compute 89 = True
compute 1 = False
compute x = fetch (sumSquareDigits x)
solve :: Int -> Int
solve n = length $ filter chainTo89 $ [1 .. n - 1] where
chainTo89 = makeChainTo89 ndigits :: Int -> [Int]
digits 0 = []
digits x = x `mod` 10 : digits (x `div` 10)Context
StackExchange Code Review Q#97399, answer score: 6
Revisions (0)
No revisions yet.