patternModerate
99 Haskell problems, problem 9: Pack function
Viewed 0 times
problempackfunctionhaskellproblems
Problem
Today I started to tackle the 99 Haskell problems (http://www.haskell.org/haskellwiki/99_questions). I had to struggle with problem 9 which requires to Pack consecutive duplicates of list elements into sublists. If a list contains repeated elements they should be placed in separate sublists..
For example the input:
I managed to create a
overcomplicated:
A test of my
For example the input:
pack ['a', 'a', 'a', 'a', 'b', 'c', 'c', 'a', 'a', 'd', 'e', 'e', 'e', 'e']
["aaaa","b","cc","aa","d","eeee"]I managed to create a
pack function. However, my solution feels kind of overcomplicated:
pack :: [[Char]] -> [[Char]]
pack xs = foldl (\acc x -> if not(null acc) && x `Data.List.isPrefixOf` (last acc)
then (init acc) ++ [(last acc) ++ x]
else acc ++ [x]) [] xsA test of my
pack function:pack ["a", "a", "b", "b", "b", "c", "a", "a", "c"]
["aa","bbb","c","aa","c"]- Can you give me a code review?
- How can I simplify it? What could I improve to make this function more idiomatic?
Solution
Let's improve the function step by step, focusin on readability as well as on the algorithm itself.
-
I believe the type of the function is not what it should be. We get a list of characters, and we should produce a list of lists:
-
The function doesn't require that we compare
-
The major problem is that the function has O(n^2) time complexity. This is because calling
Most likely we could also omit
-
In Haskell it's customary to avoid
we don't need to use any potentially dangerous functions such as
Here we also introduced another small improvement called η-conversion (eta). The idea is that
-
We'd like to avoid calling
The problem is that we're using
Let's start with implementing
The idea is to express how to process a single element, knowing how the rest of the list is processed.
Let's apply the same principle to
This isn't very different from the previous example. The main difference is that we (seemingly!) consume elements from the right, so we have no problem with reversing the order.
-
And still ... we can ask for more. The problem is that the function isn't lazy enough (even though we're using
produced "a", instead of looping forever - the required information is available. The reason for this is that we inspect the accumulator (by matching it with the pattern
Instead, we can create the result structure first to hold the current element without forcing the results of processing the rest of input list, so that this structure will be filled with more values only when/if it is so demanded:
See also the source code for group, which is also implem
-
I believe the type of the function is not what it should be. We get a list of characters, and we should produce a list of lists:
pack1 :: [Char] -> [[Char]]
pack1 xs = foldl (\acc x -> if not(null acc) && [x] `isPrefixOf` (last acc)
then (init acc) ++ [(last acc) ++ [x]]
else acc ++ [[x]]) [] xs-
The function doesn't require that we compare
Chars. In such a case, it's better to make its type as generic as possible. Not only it makes it usage more generic, it also helps us ensure that we restrict to the specific problem of packing, not introducing anything particular about Chars. So the generalized type would be (Eq a) => [a] -> [[a]].-
The major problem is that the function has O(n^2) time complexity. This is because calling
last on a list, or appending someting to its end using ++ is O(n), and doing it n-times gets us to O(n^2). This is definitely something we'd like to avoid. The solution is easy: Instead of appending to the list, let's just prepend, and reverse it at the end. Since reverse is O(n), the result will be O(n) as well. Let's apply this idea to both the "big" list as well as its sublists:pack2 :: (Eq a) => [a] -> [[a]]
pack2 xs = reverse . map reverse $
foldl (\acc x -> if not (null acc) && [x] `isPrefixOf` (head acc)
then (x : (head acc)) : (tail acc)
else [x] : acc) [] xsMost likely we could also omit
map reverse, as the sublists always contain equal elements. This assumes that if two elements are equal, they're indistinguishable, which is usually a reasonable assumption. But I kept it there for completeness, and it doesn't affect the asymptotic complexity.-
In Haskell it's customary to avoid
if/then/else in favor of pattern matching. Not only it's often shorter, but it provides stronger guarantees and/or less required reasoning about the code. In the if condition, the reader needs to infer that using last or head on acc is allowed because we checked it's non-empty. If we instead factor out the inner function and use patternmatching,pack3 :: (Eq a) => [a] -> [[a]]
pack3 = reverse . map reverse . foldl f []
where f acc@(as@(a:_):ass) x | a == x = (x : as) : ass
f acc x = [x] : accwe don't need to use any potentially dangerous functions such as
head/tail and do any checking.Here we also introduced another small improvement called η-conversion (eta). The idea is that
\x -> g x is equivalent to g.-
We'd like to avoid calling
reverse.The problem is that we're using
foldl to produce the final list. By its nature, if we use foldl for converting lists, it reverses the order of the elements (or causes O(n^2) time complexity). Although we want to consume the list from left to right, it's more appropriate to use foldr here. (I'd say that the only reason to use foldl is if the result needs the whole list - for example when summing a list.)Let's start with implementing
map using foldr. (I'll hide the example, if you want to try it yourself. If you don't, just hover the mouse over it.)map' f = foldr (\x r -> f x : r) []The idea is to express how to process a single element, knowing how the rest of the list is processed.
Let's apply the same principle to
pack:pack4 :: (Eq a) => [a] -> [[a]]
pack4 = foldr f []
where f x acc@(as@(a:_):ass) | a == x = (x : as) : ass
f x acc = [x] : accThis isn't very different from the previous example. The main difference is that we (seemingly!) consume elements from the right, so we have no problem with reversing the order.
-
And still ... we can ask for more. The problem is that the function isn't lazy enough (even though we're using
foldr). We'd like for example that head $ pack4 ('a' : 'c' : repeat 'b')produced "a", instead of looping forever - the required information is available. The reason for this is that we inspect the accumulator (by matching it with the pattern
acc@(as@(a:_):ass) which is not irrefutable) at each iteration, which means that in order to produce a single element we first force the processing of the rest of the list. Instead, we can create the result structure first to hold the current element without forcing the results of processing the rest of input list, so that this structure will be filled with more values only when/if it is so demanded:
pack5 :: (Eq a) => [a] -> [[a]]
pack5 = foldr f []
where
f x r = (x:u):w
where
(u,w) = case r of
[] -> ([],[])
(g:gs) | head g==x -> (g ,gs)
gs -> ([],gs)See also the source code for group, which is also implem
Code Snippets
pack1 :: [Char] -> [[Char]]
pack1 xs = foldl (\acc x -> if not(null acc) && [x] `isPrefixOf` (last acc)
then (init acc) ++ [(last acc) ++ [x]]
else acc ++ [[x]]) [] xspack2 :: (Eq a) => [a] -> [[a]]
pack2 xs = reverse . map reverse $
foldl (\acc x -> if not (null acc) && [x] `isPrefixOf` (head acc)
then (x : (head acc)) : (tail acc)
else [x] : acc) [] xspack3 :: (Eq a) => [a] -> [[a]]
pack3 = reverse . map reverse . foldl f []
where f acc@(as@(a:_):ass) x | a == x = (x : as) : ass
f acc x = [x] : accpack4 :: (Eq a) => [a] -> [[a]]
pack4 = foldr f []
where f x acc@(as@(a:_):ass) | a == x = (x : as) : ass
f x acc = [x] : acchead $ pack4 ('a' : 'c' : repeat 'b')Context
StackExchange Code Review Q#40188, answer score: 10
Revisions (0)
No revisions yet.