HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

99 Haskell problems, problem 9: Pack function

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
problempackfunctionhaskellproblems

Problem

Today I started to tackle the 99 Haskell problems (http://www.haskell.org/haskellwiki/99_questions). I had to struggle with problem 9 which requires to Pack consecutive duplicates of list elements into sublists. If a list contains repeated elements they should be placed in separate sublists..

For example the input:

pack ['a', 'a', 'a', 'a', 'b', 'c', 'c', 'a', 'a', 'd', 'e', 'e', 'e', 'e']
["aaaa","b","cc","aa","d","eeee"]


I managed to create a pack function. However, my solution feels kind of
overcomplicated:

pack :: [[Char]] -> [[Char]]
pack xs = foldl (\acc x -> if not(null acc) && x `Data.List.isPrefixOf` (last acc) 
                           then (init acc) ++ [(last acc) ++ x] 
                           else acc ++ [x]) [] xs


A test of my pack function:

pack ["a", "a", "b", "b", "b", "c", "a", "a", "c"]
["aa","bbb","c","aa","c"]


  • Can you give me a code review?



  • How can I simplify it? What could I improve to make this function more idiomatic?

Solution

Let's improve the function step by step, focusin on readability as well as on the algorithm itself.

-
I believe the type of the function is not what it should be. We get a list of characters, and we should produce a list of lists:

pack1 :: [Char] -> [[Char]]
pack1 xs = foldl (\acc x -> if not(null acc) && [x] `isPrefixOf` (last acc) 
                            then (init acc) ++ [(last acc) ++ [x]] 
                            else acc ++ [[x]]) [] xs


-
The function doesn't require that we compare Chars. In such a case, it's better to make its type as generic as possible. Not only it makes it usage more generic, it also helps us ensure that we restrict to the specific problem of packing, not introducing anything particular about Chars. So the generalized type would be (Eq a) => [a] -> [[a]].

-
The major problem is that the function has O(n^2) time complexity. This is because calling last on a list, or appending someting to its end using ++ is O(n), and doing it n-times gets us to O(n^2). This is definitely something we'd like to avoid. The solution is easy: Instead of appending to the list, let's just prepend, and reverse it at the end. Since reverse is O(n), the result will be O(n) as well. Let's apply this idea to both the "big" list as well as its sublists:

pack2 :: (Eq a) => [a] -> [[a]]
pack2 xs = reverse . map reverse $
           foldl (\acc x -> if not (null acc) && [x] `isPrefixOf` (head acc)
                            then (x : (head acc)) : (tail acc)
                            else [x] : acc) [] xs


Most likely we could also omit map reverse, as the sublists always contain equal elements. This assumes that if two elements are equal, they're indistinguishable, which is usually a reasonable assumption. But I kept it there for completeness, and it doesn't affect the asymptotic complexity.

-
In Haskell it's customary to avoid if/then/else in favor of pattern matching. Not only it's often shorter, but it provides stronger guarantees and/or less required reasoning about the code. In the if condition, the reader needs to infer that using last or head on acc is allowed because we checked it's non-empty. If we instead factor out the inner function and use patternmatching,

pack3 :: (Eq a) => [a] -> [[a]]
pack3  = reverse . map reverse . foldl f []
  where f acc@(as@(a:_):ass) x | a == x     = (x : as) : ass
        f acc                x              = [x] : acc


we don't need to use any potentially dangerous functions such as head/tail and do any checking.

Here we also introduced another small improvement called η-conversion (eta). The idea is that \x -> g x is equivalent to g.

-
We'd like to avoid calling reverse.

The problem is that we're using foldl to produce the final list. By its nature, if we use foldl for converting lists, it reverses the order of the elements (or causes O(n^2) time complexity). Although we want to consume the list from left to right, it's more appropriate to use foldr here. (I'd say that the only reason to use foldl is if the result needs the whole list - for example when summing a list.)

Let's start with implementing map using foldr. (I'll hide the example, if you want to try it yourself. If you don't, just hover the mouse over it.)


map' f = foldr (\x r -> f x : r) []

The idea is to express how to process a single element, knowing how the rest of the list is processed.

Let's apply the same principle to pack:

pack4 :: (Eq a) => [a] -> [[a]]
pack4  = foldr f []
  where f x acc@(as@(a:_):ass) | a == x     = (x : as) : ass
        f x acc                             = [x] : acc


This isn't very different from the previous example. The main difference is that we (seemingly!) consume elements from the right, so we have no problem with reversing the order.

-
And still ... we can ask for more. The problem is that the function isn't lazy enough (even though we're using foldr). We'd like for example that

head $ pack4 ('a' : 'c' : repeat 'b')


produced "a", instead of looping forever - the required information is available. The reason for this is that we inspect the accumulator (by matching it with the pattern acc@(as@(a:_):ass) which is not irrefutable) at each iteration, which means that in order to produce a single element we first force the processing of the rest of the list.

Instead, we can create the result structure first to hold the current element without forcing the results of processing the rest of input list, so that this structure will be filled with more values only when/if it is so demanded:

pack5 :: (Eq a) => [a] -> [[a]]
pack5  = foldr f []
  where 
    f x r = (x:u):w 
            where 
              (u,w) = case r of 
                        []                 -> ([],[]) 
                        (g:gs) | head g==x -> (g ,gs) 
                        gs                 -> ([],gs)


See also the source code for group, which is also implem

Code Snippets

pack1 :: [Char] -> [[Char]]
pack1 xs = foldl (\acc x -> if not(null acc) && [x] `isPrefixOf` (last acc) 
                            then (init acc) ++ [(last acc) ++ [x]] 
                            else acc ++ [[x]]) [] xs
pack2 :: (Eq a) => [a] -> [[a]]
pack2 xs = reverse . map reverse $
           foldl (\acc x -> if not (null acc) && [x] `isPrefixOf` (head acc)
                            then (x : (head acc)) : (tail acc)
                            else [x] : acc) [] xs
pack3 :: (Eq a) => [a] -> [[a]]
pack3  = reverse . map reverse . foldl f []
  where f acc@(as@(a:_):ass) x | a == x     = (x : as) : ass
        f acc                x              = [x] : acc
pack4 :: (Eq a) => [a] -> [[a]]
pack4  = foldr f []
  where f x acc@(as@(a:_):ass) | a == x     = (x : as) : ass
        f x acc                             = [x] : acc
head $ pack4 ('a' : 'c' : repeat 'b')

Context

StackExchange Code Review Q#40188, answer score: 10

Revisions (0)

No revisions yet.