HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Counting items in categories

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
categoriesitemscounting

Problem

I'm an experienced programmer that also has a little experience with functional programming, although mostly theoretical (as in hobby-level reading and very minor projects).

I recently decided to work through the new book "Beginning Haskell" to get up to speed with Haskell and try to crank out something worthy of a github repo. Early in the book you are asked to implement a Client ADT including some supportive data types for Person and Gender. I did it like this and I'm not liking it at all:

data Gender = Male | Female | Unknown | None
              deriving (Show, Eq)

data Person = Person { fName :: String, lName :: String, gender :: Gender }
              deriving (Show, Eq) 

data Client = GovOrg { code :: String,     name :: String }
            | Individual { code :: String, person :: Person }
            | Company { code :: String,    name :: String, contact :: Person }
              deriving (Show, Eq)


After this you are asked to write a function that takes [Client] and returns the count for the different genders, e.g. "out of the 20 clients listed, we have 5 male, 5 female, 10 unknown". Let's call it countGenders. Below is the code I've ended up with, and I feel that it's a complete mess and lacks any feeling of clarity or elegance. I think my OOP-brain is forcing me into bad designs. I'll walk you through the lines below. Once again I am not at all happy with this.

First we have a result data type. Could be just an (Int, Int, Int), but I feel like the location in the tuple isnt a natural fit with what the data should represent (as opposed to Point Int Int):

data GenderCount = GenderCount { male :: Int, female :: Int, unknown :: Int }

Then we have the top level function that is supposed to be invoked. Maps genderCount across a [Client], which results in [(Int, Int, Int)], then unzips those and sums each list.

```
countGenders :: [Client] -> GenderCount
countGenders cs = GenderCount (sum m) (sum f) (sum u)
where (m, f, u)

Solution

This isn't that much of a mess at all. Let's clean it up, then get a little fancy.

Right off the bat it seems a little funky to have a None Gender, as it appears you're not actually trying to account for agender individuals but instead providing for the failure to produce a Gender for a particular Client. Operations that might fail in Haskell usually signal so by returning a Maybe value, so let's drop None from the definition and see what needs changing.

data Gender = Male | Female | Unknown
              deriving (Show, Eq)


genderCount and cGender depend on None, let's start with the latter. c doesn't mean much as a prefix to me, so I'm going to call this function clientGender, but that's a stylistic preference. Changing this function to use Maybe is straightforward.

clientGender :: Client -> Maybe Gender
clientGender GovOrg {}       = Nothing
clientGender Individual {..} = Just (gender person)
clientGender Company {..}    = Just (gender contact)


Now let's turn to genderCount. The first thing to notice is that the function is a tiny wrapped up case statement. This is usually a code smell to me that hints at a missed opportunity to break functions into smaller, more independent pieces. In this case genderCount has nothing to do with Clients, so let's rip that part out!

genderCount :: Gender -> (Int, Int, Int) -- Counting Genders, not Clients!
genderCount Male    = (1, 0, 0)
genderCount Female  = (0, 1, 0)
genderCount Unknown = (0, 0, 1)


This is still unsatisfactory, right? There's that convention-typed tuple, and we've got a perfectly good GenderCount datatype lying around, so let's use it.

genderCount :: Gender -> GenderCount
genderCount Male    = GenderCount 1 0 0
genderCount Female  = GenderCount 0 1 0
genderCount Unknown = GenderCount 0 0 1


It's a small change, but users of GenderCount can rely on the field names rather than a tuple ordering.

And now countGenders, the piece that glues it all together. Our type is still correct, which is awesome! No changes there. The implementation though is doing a few different things we'll need to adjust. In an informational sense it's determining the genders of all of the clients, then accumulating a count of each Gender. What it looks like though is some weird tuple math to produce a GenderCount value from nowhere! We can rewrite it given our new implementations to be a little prettier, but first we're going to need a way to add two GenderCounts together.

addGenderCounts :: GenderCount -> GenderCount -> GenderCount
addGenderCounts (GenderCount m f u) (GenderCount n g v) = GenderCount (m + n) (f + g) (u + v)


A lot of repeated instances of GenderCount in there, but that's not so bad given we can use this as a combinator. Now we can put our countGenders function together.

countGenders :: [Client] -> GenderCount
countGenders = foldr (addGenderCounts . genderCount) (GenderCount 0 0 0) . mapMaybe clientGender


This works! I've imported mapMaybe from Data.Maybe here to account for our clientGender function sometimes returning Nothing (mapMaybe drops all the Nothings and returns a list of the Just values). We use a right fold to accumulate our GenderCount values, and a starting value of GenderCount 0 0 0 for our accumulator.

There are a few ways to go from here to clean things up further. You could get rid of the GenderCount 0 0 0 value by using foldr1, at the cost of adding another composition with map into the mix. If you have sharp eyes and a working knowledge of the Typeclassopedia you'll note a striking similarity between the way we use GenderCount with a right fold, and a Monoid. If you don't have a working knowledge of the Typeclassopedia, our motivation is that Monoids allow us to specify an identity element and an associative reduction operation, revealing some higher level abstractions (and functions) we can use to wire our code together. Let's make a Monoid.

instance Monoid GenderCount where
    mempty = GenderCount 0 0 0
    mappend = addGenderCounts

-- Laws:
-- mempty <> x = x
-- x <> mempty = x
-- x <> (y <> z) = (x <> y) <> z


I won't prove the Monoid laws, but you should be able to see that they are trivial given the properties of addition and our definition of GenderCount.

Let's try one more pass at countGenders now.

countGenders :: [Client] -> GenderCount
countGenders = foldMap genderCount . mapMaybe clientGender


Nice!

Code Snippets

data Gender = Male | Female | Unknown
              deriving (Show, Eq)
clientGender :: Client -> Maybe Gender
clientGender GovOrg {}       = Nothing
clientGender Individual {..} = Just (gender person)
clientGender Company {..}    = Just (gender contact)
genderCount :: Gender -> (Int, Int, Int) -- Counting Genders, not Clients!
genderCount Male    = (1, 0, 0)
genderCount Female  = (0, 1, 0)
genderCount Unknown = (0, 0, 1)
genderCount :: Gender -> GenderCount
genderCount Male    = GenderCount 1 0 0
genderCount Female  = GenderCount 0 1 0
genderCount Unknown = GenderCount 0 0 1
addGenderCounts :: GenderCount -> GenderCount -> GenderCount
addGenderCounts (GenderCount m f u) (GenderCount n g v) = GenderCount (m + n) (f + g) (u + v)

Context

StackExchange Code Review Q#48935, answer score: 3

Revisions (0)

No revisions yet.