HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

String splitting function in Haskell

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
functionstringsplittinghaskell

Problem

I need to write a function which selects a run of repeated characters from the start of a string, with the run comprising at most nine characters.

chomp :: String -> String
chomp str =  takeWhile (== head str) str

munch :: String -> String
munch = take 9 . chomp

runs :: String -> [String]
runs string = extract string [] where
    extract str xs
        | length str == 0 = xs
        | otherwise = extract (drop (length (munch str)) str) (xs ++ [(munch str)])


I can't define any new functions (except helper ones), however, I can use all of the haskell built-in ones. Is there a nicer way to write the runs function?

Examples

runs "aaaaabbbbcc"
["aaaaa","bbbb","cc"]

runs "aaaaaaaaaaaaaaabbbbcc"
["aaaaaaaaa","aaaaaa","bbbb","cc"]

runs "dddddddddddd"
["ddddddddd","ddd"]

runs "aaabbbaaa"
["aaa","bbb","aaa"]

Solution

Another variant of runs

Let us start with the finished product first, and then I will show you how to get there:

runs :: String -> [String]
runs "" = []
runs xs = munched : runs (drop (length munched) xs)
  where munched = munch xs


Use pattern matching

In your extract function, you've checked str's length. However, that is not necessary and is very expensive. If you want to check whether a list is empty either use null or pattern matching. With pattern matching, we end up with

extract []  xs = xs
extract str xs = extract (drop (length (munch str)) str) (xs ++ [(munch str)])


Use bindings to keep repetition low

However, we repeat ourselves here and use munch str twice. Let's get rid of that:

extract []  xs = xs
extract str xs = extract (drop (length munched) str) (xs ++ [munched])
  where
    munched = munch str


Our line got a little bit shorter. Great. Now let us have a look at your accumulator.

Avoid repeated ++

Your results looks like this:

munch str ++ munch str' ++ munch str'' ++ munch str''' …


However, with parentheses, we actually have

(…(((munch str ++ munch str') ++ munch str'') ++ munch str''') …)


Since ++ is linear in its first argument, that is going to be slow. However, we don't need to use ++ at all, since we're constructing a list element-wise from the first to the last. So instead of an accumulator, let us have extract return the list as soon as possible:

extract [] = []
extract str = munched : extract (drop (length munched) str)
  where munched = munch str


At that point extract has exactly the same type as runs, so we can get rid of it. We end up with

runs :: String -> [String]
runs []  = []
runs str = munched : runs (drop (length munched) str)
  where munched = munch str


Replace str with xs and the first [] with "", and you have my first variant. We're done.

Other functions

You could rewrite chomp in a way that returns already groups of characters, e.g.

chomp "aaaaabbbbbccccc" = ["aaaaa","bbbbb","ccccc"]


Then, you could rewrite munch so that it splits a string after 9 characters:

munch "aaaaaaaaaa" = ["aaaaaaaa","aa"]


If you do that, runs gets a lot simpler:

runs = concatMap munch . chomp


However, the types of chomp and munch would differ in that case. By the way, this variant of chomp is in the standard library, and munch is also easy to implement.

Code Snippets

runs :: String -> [String]
runs "" = []
runs xs = munched : runs (drop (length munched) xs)
  where munched = munch xs
extract []  xs = xs
extract str xs = extract (drop (length (munch str)) str) (xs ++ [(munch str)])
extract []  xs = xs
extract str xs = extract (drop (length munched) str) (xs ++ [munched])
  where
    munched = munch str
munch str ++ munch str' ++ munch str'' ++ munch str''' …
(…(((munch str ++ munch str') ++ munch str'') ++ munch str''') …)

Context

StackExchange Code Review Q#158183, answer score: 2

Revisions (0)

No revisions yet.