HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Haskell Parsec parser of Verilog-style number literals

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
literalsnumberparserverilogstylehaskellparsec

Problem

I've set myself the task to write a function that parses Verilog-style number literals. In Verilog, numbers are written like this:
8'b10101100, 16'hffff, '789, 32'd0, 12'o7, etc.

The syntax is:

'value


where ` denotes an optional field, and

  • size is the binary representation length of the number (in bits). The default is 32.



  • radix is 'b' or 'B' for binary, 'o' or 'O' for octal, 'd' or 'D' for decimal, 'h' or 'H' for hexadecimal. The default is 'd'.



  • value is the appropriate number representation for the given radix.



I'm new to Haskell, so I really would appreciate a critical appraisal of the following implementation. I think it is correct. Is it also 'good' Haskell/Parsec? Can it be done better, i.e. more concisely, more clearly, using more appropriate idioms?

``
import System.IO
import System.Environment
import Data.Either (either)
import Data.Either.Extra (fromRight)
import Data.Char (digitToInt, toLower)
import Numeric (readInt, readOct, readHex)
import Text.Parsec.Combinator
import Text.ParserCombinators.Parsec
import Text.Parsec.Char (hexDigit, octDigit)

integerLog :: Integer -> Integer
integerLog = floor . logBase 2.0 . fromIntegral

-- |numberSize calculates the length (in bits) of the binary representation of Integer
numberSize :: Integer -> Integer
numberSize = (1+) . integerLog

-- |The type Number has a val (Integer value) and a len (binary representation length).
data Number = Number {val::Integer, len:: Integer} deriving Show

-- |makeNumber is a safe constructor of an Either String Number
makeNumber :: Integer -> Integer -> (Either String Number)
makeNumber v l
| v l = Left $ "makeNumber: Value " ++ show v ++ " is "
++ show size ++ " bits long, should be at most " ++ show l ++ " bits"
| size Number
hexToInteger x = fst $ readHex x !! 0 :: Integer
octToInteger x = fst $ readOct x !! 0 :: Integer
binToInteger x = fst $ readBin x !! 0 :: Integer

-- |binDigit is not in Text.Parsec.Char (hexDigit, octDig

Solution

Let's start with your imports. You should get rid of those you don't need, e.g. System.IO, import Data.Either.Extra (fromRight). Next, you shouldn't mix integral and floating point calculations, unless you're fine with imprecise results. For example, integerLog will give the wrong answer around 1024:

ghci> integerLog (2^1023)
1023
ghci> integerLog (2^1024)
179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137216


Next, record field names are functions:

data Number = Number {val::Integer, len:: BitSize} deriving Show


This gives you two functions,

val :: Number -> Integer
len :: Number -> Integer


Unless you're fine with those functions at global namespace, it's usually a better idea to either a) use a complete name, e.g. value, bitLength, or b) prefix/suffix them with the type's name, e.g. numberValue or lenNumber. The story slightly changes if you want to use lenses and Template Haskell, but that's for another post.

What follows afterwards are several top-level functions without type signature, which is considered bad style:

readBin = readInt 2 (`elem` "01") digitToInt

-- conversions String -> Number
hexToInteger x = fst $ readHex x !! 0 :: Integer
octToInteger x = fst $ readOct x !! 0 :: Integer
binToInteger x = fst $ readBin x !! 0 :: Integer


All those functions are partial, e.g.

ghci> hexToInteger "Hello"
*** Exception: Prelude.!!: index too large


Your use prevents this kind of error, since you're using ithexToInteger only on valid combinations, but you shouldn't export them either. That's not really an issue here since you've wrote a Main (main) module, but you should keep that in mind.

However, those functions would be a lot nicer with the appropriate type:

convertWith :: ReadS Integer -> String -> Maybe Integer
convertWith fn xs = case fn xs of
                      [(n,"")] -> Just n
                      _        -> Nothing

hexToInteger, octToInteger, binToInteger :: String -> Maybe Integer
hexToInteger = convertWith readHex
octToInteger = convertWith readOct
binToInteger = convertWith (readInt 2 (`elem` "01") digitToInt)


In this case we've dumped the plumbing into convertWith, added type signatures and made the functions non-partial. Now, let's step back to your smart-constructor:

-- |makeNumber is a safe constructor of an Either String Number
makeNumber :: Integer -> Integer -> (Either String Number)
makeNumber v l
  | v  l  = Left $ "makeNumber: Value " ++ show v ++ " is " 
    ++ show size ++ " bits long, should be at most " ++ show l ++ " bits"
  | size <= 0 = Left $ "makeNumber: Size can't be negative or zero"
  | otherwise = Right $ Number v l
  where size = numberSize v


While that's better than just a Maybe Number, it's still not optimal if you have an application that's creating those numbers. Since sum types aren't expensive to write, let's use one:

data NumberError = ENegative
                 | ENegativeSize
                 | EBitOverflow Integer Integer
 deriving (Show, Eq)


That way you can separate the formatting from the number handling:

makeNumber :: Integer -> BitSize -> (Either NumberError Number)
makeNumber v l
  | v  l  = Left $ EBitOverflow v l
  | size <= 0 = Left $ ENegativeSize
  | otherwise = Right $ Number v l
  where size = numberSize v


You could also provide some functions, e.g.

eNegative    :: Either NumberError a
eBitOverflow :: Integer -> BitSize -> Either NumberError a


to ease the use of those, but that's an overkill here.

Speaking of overkills, did you notice the BitSize above? Many of your functions use Integer -> Integer -> …. But what do those Integer's mean? Which one is the value? Which one is the bitsize? And is Integer as bitsize really a good idea?

Therefore, try to add some documentation at the type level, even if it isn't a new type:

type BitSize = Int


Now, let's have a look at the parser:

-- |parse Verilog-style numbers 'value
parseNumber :: Parser Number
parseNumber = do
  size  oneOf "dD"  oneOf "oO"  oneOf "bB"
  let (thisDigit, convert_digits) =
        case toLower radix of
          'h' -> (hexDigit, hexToInteger)
          'o' -> (octDigit, octToInteger)
          'd' -> (digit   , read)
          'b' -> (binDigit, binToInteger)
          _  ->  error "parseNumber: Internal error"
  val <- many1 thisDigit
  let size' = read size :: Integer
      val' = convert_digits val :: Integer
      num = makeNumber val' size'
    in
    either fail return num


This is well done, except for two minor quirks: you can collect all oneOf parsers into a single one:

oneOf "hHdDoObB"


and your use of error, which makes this parser partial again. Instead

Code Snippets

ghci> integerLog (2^1023)
1023
ghci> integerLog (2^1024)
179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137216
data Number = Number {val::Integer, len:: BitSize} deriving Show
val :: Number -> Integer
len :: Number -> Integer
readBin = readInt 2 (`elem` "01") digitToInt

-- conversions String -> Number
hexToInteger x = fst $ readHex x !! 0 :: Integer
octToInteger x = fst $ readOct x !! 0 :: Integer
binToInteger x = fst $ readBin x !! 0 :: Integer
ghci> hexToInteger "Hello"
*** Exception: Prelude.!!: index too large

Context

StackExchange Code Review Q#135568, answer score: 2

Revisions (0)

No revisions yet.