HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Filling gaps in time series data

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
timefillingseriesdatagaps

Problem

My Haskell set/map/list folds often become terse and difficult to read. I'm looking for tips on how to make my functional code easier to follow.

I'm working around a bug/feature in a plotting library. I have a map from tags to lists of (time,count) which I plot as a stacked graph:

import           Data.Time (UTCTime)
import qualified Data.Map as M
import qualified Data.Set as S

type CountMap = M.Map String [(UTCTime, Int)]

-- Data used for plotting data series 'foo' and 'bar'
test :: CountMap
test = 
  M.fromList [ ("foo", [(read "2012-09-28 12:00:00", 3), (read "2012-09-29 12:00:00", 4)])
             , ("bar", [(read "2012-09-28 12:00:00", 3)])
             ]


You will note that the above "bar" series is missing a sample for 2012-09-29. I need to fill these gaps with zeros before I pass the data to the plotting library.

The same data with the gaps closed becomes:

fromList [ ("bar",[(2012-09-28 12:00:00 UTC,3),(2012-09-29 12:00:00 UTC,0)])
         , ("foo",[(2012-09-28 12:00:00 UTC,3),(2012-09-29 12:00:00 UTC,4)])]


The code I use for filling the gaps with zero samples is as follows:

import           Data.Time (UTCTime)
import qualified Data.Map as M
import qualified Data.Set as S

-- Insert zero counts into date buckets that are missing a sample.
substZeroCount :: CountMap -> CountMap
substZeroCount m =
  M.map zeros m
  where
    allDates = M.fold (flip (foldr (\(date,_) -> S.insert date))) S.empty m

    zeros cs = M.toList $ S.fold insertMissing (M.fromList cs) allDates

    insertMissing date acc = if M.member date acc then acc else M.insert date 0 acc


It works and it's not even too long. But somehow I don't feel happy about its readability. It looks somehow too terse. Maybe it's just about code layout.. or perhaps there'd be a nicer way to compose these functions. Or maybe it's just because a lot happens in 3 lines of code. ;)

Any suggestions on how to make substZeroCount easier on the eyes?

Solution

I think you should keep CountMap as Map String (Map UTCTime Int). That is a better way of keeping counts. If you do that then you can use mysc function directly. I find mysc much more readable and clear.

import           Data.Time (UTCTime)
import qualified Data.Map as M

type CountMap = M.Map String [(UTCTime,Int)]

-- Data used for plotting data series 'foo' and 'bar'
test :: CountMap
test =
  M.fromList [ ("foo", [(read "2012-09-28 12:00:00", 3), (read "2012-09-29 12:00:00", 4)])
             , ("bar", [(read "2012-09-28 12:00:00", 3)])
             ]

susetCount :: CountMap -> CountMap
susetCount = M.map M.toList . mysc . M.map M.fromList
mysc m =  M.map (flip M.union allelems) m
 where
    -- Find all possible UTCTime that can exist. 
    allelems = M.map (const 0) $ M.foldr' M.union M.empty m

Code Snippets

import           Data.Time (UTCTime)
import qualified Data.Map as M

type CountMap = M.Map String [(UTCTime,Int)]

-- Data used for plotting data series 'foo' and 'bar'
test :: CountMap
test =
  M.fromList [ ("foo", [(read "2012-09-28 12:00:00", 3), (read "2012-09-29 12:00:00", 4)])
             , ("bar", [(read "2012-09-28 12:00:00", 3)])
             ]

susetCount :: CountMap -> CountMap
susetCount = M.map M.toList . mysc . M.map M.fromList
mysc m =  M.map (flip M.union allelems) m
 where
    -- Find all possible UTCTime that can exist. 
    allelems = M.map (const 0) $ M.foldr' M.union M.empty m

Context

StackExchange Code Review Q#16026, answer score: 2

Revisions (0)

No revisions yet.