patternMinor
Copy first n lines of each file in a directory
Viewed 0 times
directoryfileeachfirstlinescopy
Problem
In an attempt to learn Haskell, I've built this - it accepts a number n, a source directory, and a sink directory (ending with
/) copies n lines from each file in the source directory to the sink directory. It is not recursive. Thoughts?module Main where
import System.Directory
import Control.Monad
import System.IO
import System.IO.Error
import Data.List
import Control.Applicative
import Control.Exception.Base
import System.Environment
import Data.List
--requires / or \ at end.
listFilesInDir :: FilePath -> IO [FilePath]
listFilesInDir dir= (getDirectoryContents dir) >>= filterM (\x ->doesFileExist $ dir ++ x)
readAndWrite:: Handle -> Handle -> IO ()
readAndWrite inHandle outHandle = do
myLn if isEOFError e
then return ()
else ioError e
Right myLn -> do hPutStrLn outHandle myLn
return ()
writeFirstNLines :: Int-> FilePath->FilePath -> IO ()
writeFirstNLines n source sink =do
inHandle <- openFile source ReadMode
outHandle <- openFile sink WriteMode
replicateM_ n (readAndWrite inHandle outHandle)
main :: IO ()
main = do
args <- getArgs
let n = read $ args!!0
let source = args!!1
let sink = args!!2
let fileList=listFilesInDir source
fileSources <-fmap (map (source ++)) $ fileList
fileSinks <-fmap (map (sink ++)) $ fileList
--putStrLn "Sources"
--mapM_ putStrLn fileSources
--putStrLn "Sinks"
--mapM_ putStrLn fileSinks
q<-zipWithM (writeFirstNLines n) fileSources fileSinks
return()Solution
requires / or \ at end.
Your code requires that the user supplied source and destination
directories end in a path name separator. A better approach is
to simply add it. See the comments below on the
partial functions
When processing the command line arguments you
are using functions which could throw errors, e.g.
When they fail you'll get a generic error message like
Hackage has libraries you can use to parse and validate command line arguments
like cmdargs or any of the optparse- packages. However, it's not too difficult to roll your own like this:
also that the first one consists entirely of digits.
(++)
Currently the
together path name segments is fine. However, in System.FilePath
Haskell defines the operator
way to do this - i.e. on Windows it will use
will use a
working with a path name string.
The module
path names such as extracting the base / directory name from a path, extracting and replacing the suffix, etc. which are good to know about and may come
in handy.
A good place to use
Now
add the appropriate separator.
fileList
Note that
you are repeating the IO-action twice. That is, you are performing
want to perform
only once and use its result to compute
Note I'm using
the results. It's more efficient because
the zip operation immediately, whereas
list thus consuming memory for data you'll never look at.
hClose
In Haskell you need to explicitly close your file handles when you are done
with them. Every process has limit on the number of handles which can be open at any one time, so if you never close any handles then eventually
A basic way to make sure your file handles always get closed is to use
the
and passes the handle to an IO-action. It will ensure that the handle is
closed when the IO-action completes - whether it returns normally
or if an exception is thrown.
Using
Now both
completes.
replicateM
Note that
exactly
to write a loop which could be executed a variable number of times.
As an example of how inefficient using
suppose that
Then:
will execute
997 calls won't copy any lines.
Here is a recipe for a loop which is executed at most
with the possibility of breaking out early:
You can use this to copy the first n lines by substituting:
Your code requires that the user supplied source and destination
directories end in a path name separator. A better approach is
to simply add it. See the comments below on the
() operator.partial functions
When processing the command line arguments you
are using functions which could throw errors, e.g.
read, head, !!.When they fail you'll get a generic error message like
Prelude.read: no parse or Prelude.!!: index too large.Hackage has libraries you can use to parse and validate command line arguments
like cmdargs or any of the optparse- packages. However, it's not too difficult to roll your own like this:
import Data.Char
parseArgs (nstr : source : dest : _) =
if all isDigit nstr
then ( read nstr, source, dest )
else error "first argument is not a number"
parseArgs _ = error "not enough arguments"
main = do
args <- getArgs
let (n, source, dest) = parseArgs args
...parseArgs checks that there are at least three command line arguments andalso that the first one consists entirely of digits.
(++)
Currently the
FilePath type is a String, so using (++) to jointogether path name segments is fine. However, in System.FilePath
Haskell defines the operator
() which is meant to be a cross-platformway to do this - i.e. on Windows it will use
\\ and on Posix systems itwill use a
/. Moreover using () signals to the reader that you areworking with a path name string.
The module
System.FilePath also has utility functions for dealing withpath names such as extracting the base / directory name from a path, extracting and replacing the suffix, etc. which are good to know about and may come
in handy.
A good place to use
() is in your listFilesInDir function, e.g.:listFilesInDir dir =
getDirectoryContents >>= filterM (\x -> doesFileExist (dir x))Now
dir does not have to end in a / or \\ because using () willadd the appropriate separator.
fileList
Note that
fileList is an IO-action, so in these lines:fileSources <-fmap (map (source ++)) $ fileList
fileSinks <-fmap (map (sink ++)) $ fileListyou are repeating the IO-action twice. That is, you are performing
getDirectoryContents twice on the same directory. Instead youwant to perform
fileListonly once and use its result to compute
fileSources and fileSinks:leafNames ) leafNames
fileSinks = map (sink ) leafNames
zipWithM_ (writeFirstNLines n) fileSources fileSinksNote I'm using
zipWithM_ here - it works just like zipWithM but discardsthe results. It's more efficient because
zipWithM_ discards the result ofthe zip operation immediately, whereas
zipWithM saves the results in alist thus consuming memory for data you'll never look at.
hClose
In Haskell you need to explicitly close your file handles when you are done
with them. Every process has limit on the number of handles which can be open at any one time, so if you never close any handles then eventually
openFile will fail when you hit this limit.A basic way to make sure your file handles always get closed is to use
the
withFile function:withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO rwithFile creates a file handle by opening a path with the specified IOModeand passes the handle to an IO-action. It will ensure that the handle is
closed when the IO-action completes - whether it returns normally
or if an exception is thrown.
Using
withFile your writeFirstNLines would look like:writeFirstNLines n source sink = do
withFile source ReadMode (\hin -> do
withFile sink WriteMode (\hout -> do
... copy from hin to hout ...
)
)Now both
hin and hout will automatically get closed when the copy actioncompletes.
replicateM
Note that
replicateM_ n action will always execute actionexactly
n times - no more and no less. So this is not a good wayto write a loop which could be executed a variable number of times.
As an example of how inefficient using
replicateM_ here could be,suppose that
n is 1000 and you are operating on a 3 line file.Then:
replicateM_ n readAndWritewill execute
readAndWrite 1000 times even though the last997 calls won't copy any lines.
Here is a recipe for a loop which is executed at most
n timeswith the possibility of breaking out early:
loop n | n <= 0 = return ()
loop n = do quit <- condition -- check condition
if quit
then return () -- break out early
else ... do stuff...
loop (n-1) -- perform next iterationYou can use this to copy the first n lines by substituting:
condition = hIsEOF hin
...do stuff... = hGetLine hin >>= hPutStrLn houtCode Snippets
import Data.Char
parseArgs (nstr : source : dest : _) =
if all isDigit nstr
then ( read nstr, source, dest )
else error "first argument is not a number"
parseArgs _ = error "not enough arguments"
main = do
args <- getArgs
let (n, source, dest) = parseArgs args
...listFilesInDir dir =
getDirectoryContents >>= filterM (\x -> doesFileExist (dir </> x))fileSources <-fmap (map (source ++)) $ fileList
fileSinks <-fmap (map (sink ++)) $ fileListleafNames <- listFilesInDir source
let fileSources = map (source </>) leafNames
fileSinks = map (sink </>) leafNames
zipWithM_ (writeFirstNLines n) fileSources fileSinkswithFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO rContext
StackExchange Code Review Q#108943, answer score: 2
Revisions (0)
No revisions yet.