patternMinor
Walking a directory in Haskell
Viewed 0 times
walkinghaskelldirectory
Problem
I wrote a function that recursively walks a given directory.
However, I have several concerns with this function. For starters, it is slow and it does not print out results until they have all been collected (not as lazy as it should be). My best guess is that this is because of the constant concatenations.
Additionally, the use of
module WalkDir (walkDir) where
import System.Directory (doesDirectoryExist, getDirectoryContents)
import System.FilePath (())
walkDir :: FilePath -> IO [FilePath]
walkDir r = contents >>= fmap concat . traverse helper
where contents = fmap (r ) . filter ((&&) . (/=) "." (/=) "..") getDirectoryContents r
helper x = do e <- doesDirectoryExist x
if e then walkDir x else return [x]However, I have several concerns with this function. For starters, it is slow and it does not print out results until they have all been collected (not as lazy as it should be). My best guess is that this is because of the constant concatenations.
Additionally, the use of
do-notation in helper seems clunky. This is where I'd love it if if were just a function because I could just use >>= with no do required. Alternatively if there were a GHC extension equivalent to LambdaCase for if statements that would also work.Solution
As you noticed, this function is slow because it collects all the results before it starts printing them. To circumvent that problem, you need to interleave the collection of information and its printing.
A good way of doing that whilst keeping a compositional approach to solving the problem is to introduce a datatype reifying the structure of
You can now write
where
and
You now have an
Of course, this is a rather crude printing function (e.g. you will notice quite a few empty lists if you have empty subdirectories) but it gives you an idea of how to proceed from there on.
If this is still slow, you may want to play the same sort of trick on
A good way of doing that whilst keeping a compositional approach to solving the problem is to introduce a datatype reifying the structure of
walkDir's call graph. Instead of sequencing all IO actions and getting a list of FilePaths back, you'd build a tree describing the computation (RTree for Rose Tree and T for Transformer as it takes an m):data RTreeT m a = Node a [m (RTreeT m a)]You can now write
walkDir' describing your strategy to explore the directories on your filesystem: return the files present in the current directory immediately and then explore the subdirectories one after the other.walkDir' :: FilePath -> IO (RTreeT IO [FilePath])
walkDir' r = do
contents ) . exceptLocal getDirectoryContents r
(files, dirs) <- filesAndDirs contents
return $ Node files $ fmap walkDir' dirswhere
filesAndDirs partitions a list of FilePath depending on whether they are files or directories (using tagDirectories to perform that test).tagDirectories :: [FilePath] -> IO [(FilePath, Bool)]
tagDirectories = mapM (\ x -> (x,) doesDirectoryExist x)
filesAndDirs :: [FilePath] -> IO ([FilePath], [FilePath])
filesAndDirs c = bimap (fmap fst) . partition (not . snd) tagDirectories c
where bimap f (a, b) = (f a, f b)and
exceptLocal is the filter you had in your original code snippet:exceptLocal :: [FilePath] -> [FilePath]
exceptLocal = filter ((&&) . (/=) "." (/=) "..")You now have an
RTreeT IO [FilePath] and you can described a strategy to print it which will interleave printing some of the content and running some of the remaining IO actions:printRTreeT :: Show a => RTreeT IO a -> IO ()
printRTreeT (Node a mts) = print a >> mapM_ (printRTreeT =<<) mtsOf course, this is a rather crude printing function (e.g. you will notice quite a few empty lists if you have empty subdirectories) but it gives you an idea of how to proceed from there on.
If this is still slow, you may want to play the same sort of trick on
filesAndDirs: rather than sequencing all tests in one go, you could want to have a structure allowing you to only deal with one FilePath at a time.Code Snippets
data RTreeT m a = Node a [m (RTreeT m a)]walkDir' :: FilePath -> IO (RTreeT IO [FilePath])
walkDir' r = do
contents <- fmap (r </>) . exceptLocal <$> getDirectoryContents r
(files, dirs) <- filesAndDirs contents
return $ Node files $ fmap walkDir' dirstagDirectories :: [FilePath] -> IO [(FilePath, Bool)]
tagDirectories = mapM (\ x -> (x,) <$> doesDirectoryExist x)
filesAndDirs :: [FilePath] -> IO ([FilePath], [FilePath])
filesAndDirs c = bimap (fmap fst) . partition (not . snd) <$> tagDirectories c
where bimap f (a, b) = (f a, f b)exceptLocal :: [FilePath] -> [FilePath]
exceptLocal = filter ((&&) . (/=) "." <*> (/=) "..")printRTreeT :: Show a => RTreeT IO a -> IO ()
printRTreeT (Node a mts) = print a >> mapM_ (printRTreeT =<<) mtsContext
StackExchange Code Review Q#119926, answer score: 3
Revisions (0)
No revisions yet.