HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Walking a directory in Haskell

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
walkinghaskelldirectory

Problem

I wrote a function that recursively walks a given directory.

module WalkDir (walkDir) where

import System.Directory (doesDirectoryExist, getDirectoryContents)
import System.FilePath (())

walkDir :: FilePath -> IO [FilePath]
walkDir r = contents >>= fmap concat . traverse helper
    where contents = fmap (r ) . filter ((&&) . (/=) "."  (/=) "..")  getDirectoryContents r
          helper x = do e <- doesDirectoryExist x
                        if e then walkDir x else return [x]


However, I have several concerns with this function. For starters, it is slow and it does not print out results until they have all been collected (not as lazy as it should be). My best guess is that this is because of the constant concatenations.

Additionally, the use of do-notation in helper seems clunky. This is where I'd love it if if were just a function because I could just use >>= with no do required. Alternatively if there were a GHC extension equivalent to LambdaCase for if statements that would also work.

Solution

As you noticed, this function is slow because it collects all the results before it starts printing them. To circumvent that problem, you need to interleave the collection of information and its printing.

A good way of doing that whilst keeping a compositional approach to solving the problem is to introduce a datatype reifying the structure of walkDir's call graph. Instead of sequencing all IO actions and getting a list of FilePaths back, you'd build a tree describing the computation (RTree for Rose Tree and T for Transformer as it takes an m):

data RTreeT m a = Node a [m (RTreeT m a)]


You can now write walkDir' describing your strategy to explore the directories on your filesystem: return the files present in the current directory immediately and then explore the subdirectories one after the other.

walkDir' :: FilePath -> IO (RTreeT IO [FilePath])
walkDir' r = do
  contents      ) . exceptLocal  getDirectoryContents r
  (files, dirs) <- filesAndDirs contents
  return $ Node files $ fmap walkDir' dirs


where filesAndDirs partitions a list of FilePath depending on whether they are files or directories (using tagDirectories to perform that test).

tagDirectories :: [FilePath] -> IO [(FilePath, Bool)]
  tagDirectories = mapM (\ x -> (x,)  doesDirectoryExist x)

  filesAndDirs :: [FilePath] -> IO ([FilePath], [FilePath])
  filesAndDirs c = bimap (fmap fst) . partition (not . snd)  tagDirectories c
    where bimap f (a, b) = (f a, f b)


and exceptLocal is the filter you had in your original code snippet:

exceptLocal :: [FilePath] -> [FilePath]
  exceptLocal = filter ((&&) . (/=) "."  (/=) "..")


You now have an RTreeT IO [FilePath] and you can described a strategy to print it which will interleave printing some of the content and running some of the remaining IO actions:

printRTreeT :: Show a => RTreeT IO a -> IO ()
printRTreeT (Node a mts) = print a >> mapM_ (printRTreeT =<<) mts


Of course, this is a rather crude printing function (e.g. you will notice quite a few empty lists if you have empty subdirectories) but it gives you an idea of how to proceed from there on.

If this is still slow, you may want to play the same sort of trick on filesAndDirs: rather than sequencing all tests in one go, you could want to have a structure allowing you to only deal with one FilePath at a time.

Code Snippets

data RTreeT m a = Node a [m (RTreeT m a)]
walkDir' :: FilePath -> IO (RTreeT IO [FilePath])
walkDir' r = do
  contents      <- fmap (r </>) . exceptLocal <$> getDirectoryContents r
  (files, dirs) <- filesAndDirs contents
  return $ Node files $ fmap walkDir' dirs
tagDirectories :: [FilePath] -> IO [(FilePath, Bool)]
  tagDirectories = mapM (\ x -> (x,) <$> doesDirectoryExist x)

  filesAndDirs :: [FilePath] -> IO ([FilePath], [FilePath])
  filesAndDirs c = bimap (fmap fst) . partition (not . snd) <$> tagDirectories c
    where bimap f (a, b) = (f a, f b)
exceptLocal :: [FilePath] -> [FilePath]
  exceptLocal = filter ((&&) . (/=) "." <*> (/=) "..")
printRTreeT :: Show a => RTreeT IO a -> IO ()
printRTreeT (Node a mts) = print a >> mapM_ (printRTreeT =<<) mts

Context

StackExchange Code Review Q#119926, answer score: 3

Revisions (0)

No revisions yet.