HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

PPM/PGM Image processing library in Haskell

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
imagepgmppmlibraryhaskellprocessing

Problem

I've written a small library for reading and writing PGM/PPM images. The format is described here. I attach the library itself and a small utility to convert binary encoded images to ASCII encoding. Any type of comment will be appriciated. However, these points are most important to me:

-
Abusing of the type system - I fear that I forced my OOP design on the Haskell type system. I wanted to use typeclasses in order to prevent code duplication as much as possible between PGM and PPM images. However, the final result was that I had to duplicate code in several places. The worse thing is that I was forced to add a type signature in the bin2asc utility, so it doesn't support PGM. How can I design this library better to support both types?

-
Performance - Running my bin2asc utility on this image takes 0.633 seconds. I think it's a bit slow for an image in such size. Since I don't know how to profile code in Haskell I can't tell which function takes most of the time. Is there any bad practice that I have done that hurts performance?

-
Branching in mixed IO/Maybe functions - When using the do notiaion in a function that returns a Maybe I prevent checking for Nothings by chaining monads. However, in bin2asc I have to check explicitly for Nothing, creating an extra branch. Is there anything I can do to prevent this?

-
Parsing multiple values in a function - For parseHeader - I had to use multple r variables to hold the remainders. Is there any less error-prone implementation?

PNM.hs

```
module Data.PNM
( Coord
, Image(..)
, ImageType(..)
, Encoding(..)
, PPMImage(..)
, PGMImage(..)
, parseHeader
, Header(..)
) where

import Data.Array
import Data.Char
import Data.List
import qualified Data.ByteString.Lazy as S
import qualified Data.ByteString.Lazy.Char8 as SC
import Data.Word
import Control.Monad
import System.IO
import Control.Applicative
import Debug.Trace

type Coord = (Int,Int)
data Gray

Solution

A few improvements I notice right off the bat

newtype vs. data
replace single field data declarations with newtype. newtype has no runtime overhead.

data GrayscalePixel = GrayscalePixel Int
-- becomes
newtype GrayscalePixel = GrayscalePixel Int


Use StateT

A common pattern in your code (ex. parseHeader) is

foo x = do
    (a, s0) <- doSomething0 x
    (b, s1) <- doSomething1 s0
     ...
    (n, sn) <- doSomethingN sn_1
    return $ (f a b ... n, sn)


this is much clearer if you hide the state passing in the State monad, or in this case, the StateT monad transformer. parseHeader becomes

parseHeader :: StateT S.ByteString Maybe Header
parseHeader = do
    (enc, imgType) <- StateT parseMagic
    width  <- readInt
    height <- readInt
    depth  <- readInt
    return $ Header imgType enc (width, height) depth
    where readInt = StateT (SC.readInt . nextWord)


then you can use runStateT :: StateT s m a -> s -> m (a, s) to evaluate.

Profiling

As far as profiling is concerned, Haskell makes it really easy to get a good idea of where your program is slow. Just compile like ghc -O2 Main.hs -prof -auto-all, then run as ./Main src dest +RTS -p and it will produce Main.prof containing profiling information.

Here's the most important bit of the profiling output:

COST CENTRE         MODULE  %time %alloc
dump                PNM      36.6   44.6
encodePixel.toAscii PNM      27.2   24.3
encodePixels        PNM       7.2    8.8
readPixel           PNM       4.3    6.2
...


as you can see, dump and encodePixel.toAscii are taking the majority of time.
In particular, encodePixel.toAscii is being very inefficient by showing an Int, then packing it into a ByteString using SC.pack. Optimizing these two functions should provide a nice speedup to your program.

Next Steps

A further improvement would be to use Data.Binary to parse the binary headers. It's designed for exactly this sort of task and is very fast, although I wouldn't recommend pursuing it unless you fix the two functions mentioned above. Currently the time taken to parse the headers is negligible compared to the cost of encoding to ASCII and dumping.

Code Snippets

data GrayscalePixel = GrayscalePixel Int
-- becomes
newtype GrayscalePixel = GrayscalePixel Int
foo x = do
    (a, s0) <- doSomething0 x
    (b, s1) <- doSomething1 s0
     ...
    (n, sn) <- doSomethingN sn_1
    return $ (f a b ... n, sn)
parseHeader :: StateT S.ByteString Maybe Header
parseHeader = do
    (enc, imgType) <- StateT parseMagic
    width  <- readInt
    height <- readInt
    depth  <- readInt
    return $ Header imgType enc (width, height) depth
    where readInt = StateT (SC.readInt . nextWord)
COST CENTRE         MODULE  %time %alloc
dump                PNM      36.6   44.6
encodePixel.toAscii PNM      27.2   24.3
encodePixels        PNM       7.2    8.8
readPixel           PNM       4.3    6.2
...

Context

StackExchange Code Review Q#40269, answer score: 10

Revisions (0)

No revisions yet.