patternMinor
Split a string on spaces, accounting for backslash as escape character, in Haskell
Viewed 0 times
backslashcharacterspacessplitforhaskellstringaccountingescape
Problem
This code works, but since I'm just learning Haskell I'm wondering if it could have been done in a more idiomatic way.
The main function is called
Here is the code:
The main function is called
tokenize, and it uses helper functions called tokenizeOne and tokenizeSeveral.Here is the code:
tokenizeOne "" (' ':rest) = tokenizeOne "" rest
tokenizeOne str ('\\':ch:rest) = tokenizeOne (str ++ [ch]) rest
tokenizeOne x (' ':rest) = (x, rest)
tokenizeOne x "" = (x, "")
tokenizeOne x (ch:rest) = tokenizeOne (x ++ [ch]) rest
tokenizeSeveral x "" = x
tokenizeSeveral x str = let (token, rest) = tokenizeOne "" str in
tokenizeSeveral (x ++ [token]) rest
tokenize str = tokenizeSeveral [] strSolution
Apart from general code layout I don't see much to improve - the only issue worth talking about is that you should never ever actually use something like
So let's talk about that a bit. Your options generally are:
-
Find a library function that frees you from actually constructing the list yourself. But to be fair, for this examples I can't actually think of anything helpful myself.
-
Use a data structure that copes better with it, such as
-
First construct the list using just
-
Rebuild your function so it constructs the list in the right order right away.
Here is what things look like if we go for the last option:
The key here is that we add the character
A notable advantage is that this version has better laziness properties:
We know that the first character is
Finally, a matching
This is again basically your version using direct list construction. Only difference is that we now deal with whitespace once per token, which I feel is slightly more elegant.
++ [ch]. Your function is needlessly O(n²) for that reason alone. That might not sound like much here, but it is a good idea to shake this habit early.So let's talk about that a bit. Your options generally are:
-
Find a library function that frees you from actually constructing the list yourself. But to be fair, for this examples I can't actually think of anything helpful myself.
-
Use a data structure that copes better with it, such as
Seq from Data.Sequence.-
First construct the list using just
(:), then reverse it at the end. That might seem inefficient, but given that reverse is only O(n), it will never actually increase algorithmic complexity.-
Rebuild your function so it constructs the list in the right order right away.
Here is what things look like if we go for the last option:
import Control.Arrow ( first )
token :: String -> (String, String)
token "" = ("", "")
token (' ' : cs) = ("", cs)
token ('\\':c:cs) = first (c:) $ token cs
token (c : cs) = first (c:) $ token csThe key here is that we add the character
c using first (c:) after the recursive call returns, and therefore to the front of the string.A notable advantage is that this version has better laziness properties:
*Main> head $ fst $ token ('t':undefined)
't'We know that the first character is
't' even without looking at the rest of the string, whereas your function always needs to finish a complete token before it can return. And as we are programming in Haskell, we often like our functions as lazy as possible.Finally, a matching
tokens implementation:tokens :: String -> [String]
tokens str
| null str' = []
| otherwise = tok : tokens rest
where str' = dropWhile (==' ') str
(tok, rest) = token str'This is again basically your version using direct list construction. Only difference is that we now deal with whitespace once per token, which I feel is slightly more elegant.
Code Snippets
import Control.Arrow ( first )
token :: String -> (String, String)
token "" = ("", "")
token (' ' : cs) = ("", cs)
token ('\\':c:cs) = first (c:) $ token cs
token (c : cs) = first (c:) $ token cs*Main> head $ fst $ token ('t':undefined)
't'tokens :: String -> [String]
tokens str
| null str' = []
| otherwise = tok : tokens rest
where str' = dropWhile (==' ') str
(tok, rest) = token str'Context
StackExchange Code Review Q#40968, answer score: 2
Revisions (0)
No revisions yet.