HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Simple Prefix Matching Library in Haskell

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
simplelibraryhaskellprefixmatching

Problem

For some context, I'm trying to write a streaming tokenizer in Haskell and noticed that I was writing a lot of functions with the type signature String -> Maybe Int that attempted to consume "the longest possible token matching a particular pattern" from the input stream. The pattern might be something like a quoted string literal in JSON.

I tried using Parsec for a while to get a "prefix matcher", but I kept getting tripped up over what the appropriate userstate is supposed, and what the m argument in ParsecT means. Parsec is probably a strict superset of the functionality that this little library exposes.

A "Matcher" is anything with the type String -> Maybe Int. Matchers are supposed to be greedy and produce the longest prefix that they possibly can. There's also an unenforced property that, in cases where the match is a proper prefix of the input string, adding more characters to the end of a string doesn't extend the prefix. The library also encourages / permits users to write sloppy grammars (for instance using symOrExn) that fail in ugly ways.

I could use some helping figuring out how to generalize String -> Maybe Int. I don't know what class I should insist on for the input Eq a => [a] is the first thing that comes to mind, but it might be possible to make something more general than that. As for the output type maybe a type that's constrained to be a monoid or a group? I really just need a notion of zero and the ability to increment for the index. ... although it might also be possible to generalize the notion of a "prefix" to tree-like structures.

I'm hoping to structure this library eventually as a collection of really concrete implementations for a handful of commonly-used text-like types, as well as a generic implementation that can be used in other circumstances. Advice on how to generalize the library is much appreciated.

```
module Text.PrefixMatcher.String where
import Data.List

-- A simple non-generic prefix matching library
--

Solution

Library functions, particularly Maybe's Applicative/Alternative instances, can make your code more consise:

makeMatcher spec str = length spec  Nothing
  [l] -> Just l
  -- if you reach this case, the grammar was
  -- constructed badly
  [_, _] -> error "ambiguous match"

leftOr = liftA2 ()

andThenList = foldr andThen identityMatcher

leftOrList = foldr leftOr identityMatcher


A more potent approach is to return not the length at which to split, but the results of the split. In fact, StateT String Maybe String makes this work out of the box:

type Matcher = StateT String Maybe String

makeMatcher :: String -> Matcher
makeMatcher spec = StateT $ fmap (spec,) . stripPrefix spec

runMatcher :: Matcher -> String -> Maybe (String, String)
runMatcher = runStateT

andThen = liftA2 (++)

symOrExn m1 m2 = m1  error "ambiguous match"  pure ())  m2
  -- I'll admit, this one looks clunky

leftOr = ()

andThenList = fmap concat . sequenceA

leftOrList = asum

Code Snippets

makeMatcher spec str = length spec <$ guard (isPrefixOf spec str)

andThen m1 m2 str = do
  l1 <- m1 str
  l2 <- m2 $ drop l1 str
  Just $ l1 + l2

symOrExn m1 m2 str = case catMaybes [m1 str, m2 str] of
  [] -> Nothing
  [l] -> Just l
  -- if you reach this case, the grammar was
  -- constructed badly
  [_, _] -> error "ambiguous match"

leftOr = liftA2 (<|>)

andThenList = foldr andThen identityMatcher

leftOrList = foldr leftOr identityMatcher
type Matcher = StateT String Maybe String

makeMatcher :: String -> Matcher
makeMatcher spec = StateT $ fmap (spec,) . stripPrefix spec

runMatcher :: Matcher -> String -> Maybe (String, String)
runMatcher = runStateT

andThen = liftA2 (++)

symOrExn m1 m2 = m1 <* (m2 *> error "ambiguous match" <|> pure ()) <|> m2
  -- I'll admit, this one looks clunky

leftOr = (<|>)

andThenList = fmap concat . sequenceA

leftOrList = asum

Context

StackExchange Code Review Q#157006, answer score: 2

Revisions (0)

No revisions yet.