HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

How to simplify Regex with Data.Text?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
withtextsimplifyhowregexdata

Problem

This function tells that elements of content either (maybe) match regexes that you like or match regexes that you don't like.

This messy code requires Data.Text.Lazy but internally shuffles around internal and strict versions. Also list Monad code looks fishy to me.

How can it be made simpler, faster, more general (especially work with any Text and String)?

{-# LANGUAGE OverloadedStrings #-}

module Search (analyze) where

import qualified Data.Text.Lazy as T
import Data.Maybe()
import Text.Regex.TDFA
import Text.Regex.Base.Context()
import Text.Regex.Base.RegexLike()
import Text.Regex.TDFA.Text
import Data.Function (on)
import Data.Maybe (maybeToList)

matches :: [T.Text] -> T.Text -> [T.Text]
matches patterns content = map (T.fromStrict) $ concat $ do
  pattern  [T.Text] -> T.Text -> Either [T.Text] [T.Text]
analyze likes dislikes text = 
    case matches likes text of
      liked@(_:_) -> 
        case matches dislikes text of
             [] -> Right liked
             disliked@(_:_) -> Left disliked
      _ -> Left []

Solution

To fix lazy/strict shuffling you need to import Text.Regex.TDFA.Text.Lazy module. It contains TDFA-related instances for Data.Text.Lazy

Some other things to consider.

You can skip maybeToList because (=~) returns m target for some monad m.

- return $ maybeToList $ (on (=~~) T.toStrict) content pattern
+ return $ (on (=~~) T.toStrict) content pattern


This will work without any further changes because (=~) will select list monad and match failure will be represented by empty list.

Do-notation for list monad can be replaced with single map and concat . map f -- with concatMap

- matches patterns content = map (T.fromStrict) $ concat $ do
-   pattern <- patterns
-   return $ maybeToList $ (on (=~~) T.toStrict) content pattern
+ matches patterns content = map T.fromStrict
+   $ concatMap ((on (=~~) T.toStrict) content) patterns


Now it's time to add import Text.Regex.TDFA.Text.Lazy and remove text conversions

- matches patterns content = map T.fromStrict
-   $ concatMap ((on (=~~) T.toStrict) content) patterns
+ matches patterns content = concatMap (content=~~) patterns


You can swap arguments and drop patterns varaible.

As for the analyse function, it's just a matter of taste but I think it is more readable to enumerate all variants in single case:

analyze likes dislikes text
  = case (matches text likes, matches text dislikes) of
    ([], _)       -> Left []
    (liked, [])   -> Right liked
    (_, disliked) -> Left disliked


For the last step you can import Text type unqaulified to make function types look cleaner.

Here is the result:

import Data.Text.Lazy (Text)
import qualified Data.Text.Lazy as T
import Text.Regex.TDFA.Text.Lazy
import Text.Regex.TDFA

matches :: Text -> [Text] -> [Text]
matches content = concatMap ((=~~) content)

analyze :: [Text] -> [Text] -> Text -> Either [Text] [Text]
analyze likes dislikes text
  = case map (matches text) [likes, dislikes] of
    [[], _]       -> Left []
    [liked, []]   -> Right liked
    [_, disliked] -> Left disliked

Code Snippets

- return $ maybeToList $ (on (=~~) T.toStrict) content pattern
+ return $ (on (=~~) T.toStrict) content pattern
- matches patterns content = map (T.fromStrict) $ concat $ do
-   pattern <- patterns
-   return $ maybeToList $ (on (=~~) T.toStrict) content pattern
+ matches patterns content = map T.fromStrict
+   $ concatMap ((on (=~~) T.toStrict) content) patterns
- matches patterns content = map T.fromStrict
-   $ concatMap ((on (=~~) T.toStrict) content) patterns
+ matches patterns content = concatMap (content=~~) patterns
analyze likes dislikes text
  = case (matches text likes, matches text dislikes) of
    ([], _)       -> Left []
    (liked, [])   -> Right liked
    (_, disliked) -> Left disliked
import Data.Text.Lazy (Text)
import qualified Data.Text.Lazy as T
import Text.Regex.TDFA.Text.Lazy
import Text.Regex.TDFA

matches :: Text -> [Text] -> [Text]
matches content = concatMap ((=~~) content)

analyze :: [Text] -> [Text] -> Text -> Either [Text] [Text]
analyze likes dislikes text
  = case map (matches text) [likes, dislikes] of
    [[], _]       -> Left []
    [liked, []]   -> Right liked
    [_, disliked] -> Left disliked

Context

StackExchange Code Review Q#47679, answer score: 2

Revisions (0)

No revisions yet.