patternModerate
PPM/PGM Image processing library in Haskell
Viewed 0 times
imagepgmppmlibraryhaskellprocessing
Problem
I've written a small library for reading and writing PGM/PPM images. The format is described here. I attach the library itself and a small utility to convert binary encoded images to ASCII encoding. Any type of comment will be appriciated. However, these points are most important to me:
-
Abusing of the type system - I fear that I forced my OOP design on the Haskell type system. I wanted to use typeclasses in order to prevent code duplication as much as possible between PGM and PPM images. However, the final result was that I had to duplicate code in several places. The worse thing is that I was forced to add a type signature in the bin2asc utility, so it doesn't support PGM. How can I design this library better to support both types?
-
Performance - Running my bin2asc utility on this image takes 0.633 seconds. I think it's a bit slow for an image in such size. Since I don't know how to profile code in Haskell I can't tell which function takes most of the time. Is there any bad practice that I have done that hurts performance?
-
Branching in mixed IO/Maybe functions - When using the
-
Parsing multiple values in a function - For
PNM.hs
```
module Data.PNM
( Coord
, Image(..)
, ImageType(..)
, Encoding(..)
, PPMImage(..)
, PGMImage(..)
, parseHeader
, Header(..)
) where
import Data.Array
import Data.Char
import Data.List
import qualified Data.ByteString.Lazy as S
import qualified Data.ByteString.Lazy.Char8 as SC
import Data.Word
import Control.Monad
import System.IO
import Control.Applicative
import Debug.Trace
type Coord = (Int,Int)
data Gray
-
Abusing of the type system - I fear that I forced my OOP design on the Haskell type system. I wanted to use typeclasses in order to prevent code duplication as much as possible between PGM and PPM images. However, the final result was that I had to duplicate code in several places. The worse thing is that I was forced to add a type signature in the bin2asc utility, so it doesn't support PGM. How can I design this library better to support both types?
-
Performance - Running my bin2asc utility on this image takes 0.633 seconds. I think it's a bit slow for an image in such size. Since I don't know how to profile code in Haskell I can't tell which function takes most of the time. Is there any bad practice that I have done that hurts performance?
-
Branching in mixed IO/Maybe functions - When using the
do notiaion in a function that returns a Maybe I prevent checking for Nothings by chaining monads. However, in bin2asc I have to check explicitly for Nothing, creating an extra branch. Is there anything I can do to prevent this?-
Parsing multiple values in a function - For
parseHeader - I had to use multple r variables to hold the remainders. Is there any less error-prone implementation?PNM.hs
```
module Data.PNM
( Coord
, Image(..)
, ImageType(..)
, Encoding(..)
, PPMImage(..)
, PGMImage(..)
, parseHeader
, Header(..)
) where
import Data.Array
import Data.Char
import Data.List
import qualified Data.ByteString.Lazy as S
import qualified Data.ByteString.Lazy.Char8 as SC
import Data.Word
import Control.Monad
import System.IO
import Control.Applicative
import Debug.Trace
type Coord = (Int,Int)
data Gray
Solution
A few improvements I notice right off the bat
newtype vs. data
replace single field
Use StateT
A common pattern in your code (ex.
this is much clearer if you hide the state passing in the
then you can use
Profiling
As far as profiling is concerned, Haskell makes it really easy to get a good idea of where your program is slow. Just compile like
Here's the most important bit of the profiling output:
as you can see,
In particular,
Next Steps
A further improvement would be to use
newtype vs. data
replace single field
data declarations with newtype. newtype has no runtime overhead.data GrayscalePixel = GrayscalePixel Int
-- becomes
newtype GrayscalePixel = GrayscalePixel IntUse StateT
A common pattern in your code (ex.
parseHeader) isfoo x = do
(a, s0) <- doSomething0 x
(b, s1) <- doSomething1 s0
...
(n, sn) <- doSomethingN sn_1
return $ (f a b ... n, sn)this is much clearer if you hide the state passing in the
State monad, or in this case, the StateT monad transformer. parseHeader becomes parseHeader :: StateT S.ByteString Maybe Header
parseHeader = do
(enc, imgType) <- StateT parseMagic
width <- readInt
height <- readInt
depth <- readInt
return $ Header imgType enc (width, height) depth
where readInt = StateT (SC.readInt . nextWord)then you can use
runStateT :: StateT s m a -> s -> m (a, s) to evaluate.Profiling
As far as profiling is concerned, Haskell makes it really easy to get a good idea of where your program is slow. Just compile like
ghc -O2 Main.hs -prof -auto-all, then run as ./Main src dest +RTS -p and it will produce Main.prof containing profiling information.Here's the most important bit of the profiling output:
COST CENTRE MODULE %time %alloc
dump PNM 36.6 44.6
encodePixel.toAscii PNM 27.2 24.3
encodePixels PNM 7.2 8.8
readPixel PNM 4.3 6.2
...as you can see,
dump and encodePixel.toAscii are taking the majority of time.In particular,
encodePixel.toAscii is being very inefficient by showing an Int, then packing it into a ByteString using SC.pack. Optimizing these two functions should provide a nice speedup to your program.Next Steps
A further improvement would be to use
Data.Binary to parse the binary headers. It's designed for exactly this sort of task and is very fast, although I wouldn't recommend pursuing it unless you fix the two functions mentioned above. Currently the time taken to parse the headers is negligible compared to the cost of encoding to ASCII and dumping.Code Snippets
data GrayscalePixel = GrayscalePixel Int
-- becomes
newtype GrayscalePixel = GrayscalePixel Intfoo x = do
(a, s0) <- doSomething0 x
(b, s1) <- doSomething1 s0
...
(n, sn) <- doSomethingN sn_1
return $ (f a b ... n, sn)parseHeader :: StateT S.ByteString Maybe Header
parseHeader = do
(enc, imgType) <- StateT parseMagic
width <- readInt
height <- readInt
depth <- readInt
return $ Header imgType enc (width, height) depth
where readInt = StateT (SC.readInt . nextWord)COST CENTRE MODULE %time %alloc
dump PNM 36.6 44.6
encodePixel.toAscii PNM 27.2 24.3
encodePixels PNM 7.2 8.8
readPixel PNM 4.3 6.2
...Context
StackExchange Code Review Q#40269, answer score: 10
Revisions (0)
No revisions yet.