HiveBrain v1.2.0
Get Started
← Back to all entries
snippetModerate

How to shorten this terrible HTTP header parser?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thisterribleheaderparsershortenhttphow

Problem

I am trying to read a line ending in \r\n from a Handle. This is an HTTP header.

I’m fairly new to functional programming, so I tend to use a lot of case expressions and stuff. This makes the code very long and ugly.

handleRequest :: HostName -> Handle -> IO ()
handleRequest host handle = do
    requestLine  do
                            nextChr  return s
                                _    -> readHeaderLine' $ s ++ [chr, nextChr]
                        _    -> readHeaderLine' $ s ++ [chr]


How can I reduce the amount of case expressions in this code? I thought of using Parsec, but that seemed overkill to me for something this trivial, and I don’t know how well it works with Handles.

Solution

Here I assume you just want to see how your current code can be improved without changing the algoritm.

First of all, give HLint tool a chance to suggest you obvious improvements. In your case the only improvement was that do in do readHeaderLine' "" was redundant, so not much.

Second, in my opinion many small top-level definitions are better than few large ones. You can still control namespace pollution by not exporting definitions local to the module:

import System.IO

type HostName = String

handleRequest :: HostName -> Handle -> IO ()
handleRequest host handle = do
    requestLine  do
                nextChr  return s
                    _    -> readHeaderLine' $ s ++ [chr, nextChr]
            _    -> readHeaderLine' $ s ++ [chr]


Next, your nextChr

  • separate monadic code from non-monadic code



The initial
redHeaderLine' call can be implemented using recurse with an extra parameter:

readHeaderLine handle = recurse [] [] where
    recurse s x = readHeaderLine' $ s ++ x 
    readHeaderLine' s = foo '\r' haveCR noCR where
        haveCR = foo '\n' (return s) haveCRnoLF
        noCR chr = recurse s [chr]
        haveCRnoLF nextChr = recurse s ['\r', nextChr]
        foo quux bar baz = do
            chr <- hGetChar handle
            if chr == quux then bar else baz chr


Now we can inline
readHeaderLine' as it is only applied once:

readHeaderLine handle = recurse [] [] where
    recurse s1 x = foo '\r' haveCR noCR where
        s = s1 ++ x
        haveCR = foo '\n' (return s) haveCRnoLF
        noCR chr = recurse s [chr]
        haveCRnoLF nextChr = recurse s ['\r', nextChr]
        foo quux bar baz = do
            chr <- hGetChar handle
            if chr == quux then bar else baz chr


And we can remove duplication of
recurse s:

readHeaderLine handle = recurse [] [] where
    recurse s1 x = foo '\r' haveCR noCR where
        s = s1 ++ x
        rf = recurse s
        haveCR = foo '\n' (return s) haveCRnoLF
        noCR chr = rf [chr]
        haveCRnoLF nextChr = rf ['\r', nextChr]
        foo quux bar baz = do
            chr <- hGetChar handle
            if chr == quux then bar else baz chr


Now let's put return value of
recurse into a local declaration g:

readHeaderLine handle = recurse [] [] where
    recurse s1 x = g where
        s = s1 ++ x
        rf = recurse s

        g = foo '\r' haveCR noCR
        haveCR = foo '\n' (return s) haveCRnoLF
        noCR chr = rf [chr]
        haveCRnoLF nextChr = rf ['\r', nextChr]
        foo quux bar baz = do
            chr <- hGetChar handle
            if chr == quux then bar else baz chr


Our goal is to divorse
g from recurse. You can do it by adding parameters to both recurse and g and localizing identifiers used only in recurse and used only in g:

readHeaderLine handle = recurse g [] [] where
recurse g s1 x = g rf s where
s = s1 ++ x
rf = recurse g s

g rf s = foo '\r' haveCR noCR where
    haveCR = foo '\n' (return s) haveCRnoLF
    noCR chr = rf [chr]
    haveCRnoLF nextChr = rf ['\r', nextChr]
    foo quux bar baz = do
        chr <- hGetChar handle
        if chr == quux then bar else baz chr


Now
recurse is completely self-contained:

readHeaderLine handle = recurse g [] [] where
g rf s = foo '\r' haveCR noCR where
haveCR = foo '\n' (return s) haveCRnoLF
noCR chr = rf [chr]
haveCRnoLF nextChr = rf ['\r', nextChr]
foo quux bar baz = do
chr <- hGetChar handle
if chr == quux then bar else baz chr

recurse g s1 x = g rf s where
s = s1 ++ x
rf = recurse g s

But
g is still recursive: it has a nasty rf parameter which is an indirect recursive application. We need to move rf into recurse too. So here comes a trick: convert a function call into a constructor.

g can have only 3 return values: return s, rf [chr] and rf ['\r', nextChr]. We can represent them with a data type and return it instead of calling return or rf:

data Outcomes a b c = RF1 a | RF2 b | Return c

readHeaderLine handle = recurse g [] [] where
    g rf s = foo '\r' haveCR noCR where
        haveCR = foo '\n' (return $ Return s) haveCRnoLF
        noCR chr = return $ RF1 [chr]
        haveCRnoLF nextChr = return $ RF2 ['\r', nextChr]
        foo quux bar baz = do
            chr  rf a
            RF2 a -> rf a
            Return a -> return a


Now
rf parameter is unused, so we can clean the definitions of g and recurse:

data Outcomes a b c = RF1 a | RF2 b | Return c

readHeaderLine handle = recurse g [] [] where
    g s = foo '\r' haveCR noCR where
        haveCR = foo '\n' (return $ Return s) haveCRnoLF
        noCR chr = return $ RF1 [chr]
        haveCRnoLF nextChr = return $ RF2 ['\r', nextChr]
        foo quux bar baz = do
            chr  rf a
            RF2 a -> rf a
            Return a -> return a


Now two more improvements: a)
RF1 and RF2` outcomes can be join

Code Snippets

import System.IO

type HostName = String

handleRequest :: HostName -> Handle -> IO ()
handleRequest host handle = do
    requestLine <- readHeaderLine handle
    putStrLn $ requestLine ++ "\n-------------------"

-- FIXME: This code is bad, and its author should feel bad.
readHeaderLine handle = readHeaderLine' "" where
    readHeaderLine' s = do
        chr <- hGetChar handle
        case chr of
            '\r' -> do
                nextChr <- hGetChar handle
                case nextChr of
                    '\n' -> return s
                    _    -> readHeaderLine' $ s ++ [chr, nextChr]
            _    -> readHeaderLine' $ s ++ [chr]
foo handle quux bar baz = do
    chr <- hGetChar handle
    if chr == quux then bar else baz
readHeaderLine handle = readHeaderLine' "" where
    readHeaderLine' s = do
        chr <- hGetChar handle
        case chr of
            '\r' -> do
                foo handle '\n' (return s) (readHeaderLine' $ s ++ [chr, nextChr])
            _    -> readHeaderLine' $ s ++ [chr]
readHeaderLine handle = readHeaderLine' "" where
    readHeaderLine' s = do
        chr <- hGetChar handle
        case chr of
            '\r' -> do
                foo handle '\n' (return s) (\nextChr -> readHeaderLine' $ s ++ [chr, nextChr])
            _    -> readHeaderLine' $ s ++ [chr] 

foo handle quux bar baz = do
    chr <- hGetChar handle
    if chr == quux then bar else baz chr
readHeaderLine handle = readHeaderLine' "" where
    readHeaderLine' s = do
    foo handle '\r' (foo handle '\n' (return s) (\nextChr -> readHeaderLine' $ s ++ [chr, nextChr])) (\chr -> readHeaderLine' $ s ++ [chr])

Context

StackExchange Code Review Q#15722, answer score: 15

Revisions (0)

No revisions yet.