snippetMinor
Convert data back-and-forth between CSV files and pretty-print
Viewed 0 times
prettyconvertcsvbackbetweenfilesprintforthanddata
Problem
I wrote a script to convert CSV files in human readable format and vice versa.
Human readable format is achieved like so:
And conversion back to CSV is done with:
Code:
I'm relatively new to haskell, so I'm wondering if this can be done more elegantly.
Human readable format is achieved like so:
$ ./transform table.csv inflate
cola colb colc
1 a g
2 b g
3 c g
4 d g
5 eeeeeeeee gAnd conversion back to CSV is done with:
$ ./transformtable table.t
cola;colb;colc;
1;a;g;
2;b;g;
3;c;g;
4;d;g;
5;eeeeeeeee;g;Code:
import Data.List
import Data.List.Split
import System.Environment
main = do
args length (a++b)) names spaces
where names = wordsBy (==' ') line
spaces= wordsBy (/=' ') (line++" ") --add " " to ensure space after last col-name
compress lines = add_del $ transpose $ map (shrinkcol) cols
where delim = ';'
head_lens = dissect (head lines)
cols = transpose $ map (splitPlaces head_lens) lines
shrinkcol c = map (\el -> reverse $ dropWhile (==' ') $ reverse $ dropWhile (==' ') el) c
add_del = map (map (++[delim]))
inflate lines = transpose $ map padcol $ table
where table = transpose $ map (splitOn ";") lines --list of cols
padcol c = map (rpad m) c --pad whole col
where m = (1+) $ maximum $ (map length) c --determine longest string in col, pad one more for prettinessI'm relatively new to haskell, so I'm wondering if this can be done more elegantly.
Solution
Names
Follow the Haskell convention of names
The Haskell convention is camelCase. That is, start names with a
lowercase letter, and join words together by capitalizing them. So
Use names with meanings, not just labels
Come to think of it, an even better name would be
as that is not just what the value is, but what it means (and
what it is used for).
either. For
Don't be afraid of longer names
It is also useful and nice for other readers not to use
abbreviations, unless the scope is very small. So the
in
short names).
as
abbreviation.
Types
Use the type system
Yes, Haskell can find out what types your values and functions
have. And it will be correct in many, many cases. But if you
are wrong, Haskell cannot tell you, unless you tell the compiler
what you intend to do. So always, always write the type signatures
of at least the top-level bindings. That way, if your function is
not yet finished, or you make a mistake, the compiler will be able
to tell you almost exactly where.
So:
Adding those type signatures alone, makes the function more
readable, because now I can expect what it is going to do.
Use hlint on the code and heed warnings from ghc
Many warnings and suggestions are worthwile. They can point out
possible mistakes or partial functions, but also show what an
idiomatic Haskell solution could be. For instance:
Use
command line arguments are in an array, which you'd need to index.
Using the 0th index though, is normally done by using the
function.
is shorter in English already :)
And now we are warning-free!
Types
Use the type system
What, again? Yes. But this time, use it
Follow the Haskell convention of names
The Haskell convention is camelCase. That is, start names with a
lowercase letter, and join words together by capitalizing them. So
head_lens would be headLens, or better yet, headerLengths.Use names with meanings, not just labels
Come to think of it, an even better name would be
columnWidths,as that is not just what the value is, but what it means (and
what it is used for).
lp and erg don't tell me a lot,either. For
preprocess I would suggest a name like rightPadLines. That way, I won't be so surprised to find out what it does :)Don't be afraid of longer names
It is also useful and nice for other readers not to use
abbreviations, unless the scope is very small. So the
a and bin
dissect line are good and clear (very short scope -> veryshort names).
add_del, on the other hand, would be more readableas
addDelimiters. delim is a somewhat more conventionalabbreviation.
Types
Use the type system
Yes, Haskell can find out what types your values and functions
have. And it will be correct in many, many cases. But if you
are wrong, Haskell cannot tell you, unless you tell the compiler
what you intend to do. So always, always write the type signatures
of at least the top-level bindings. That way, if your function is
not yet finished, or you make a mistake, the compiler will be able
to tell you almost exactly where.
So:
main :: IO ()
main = do -- ..
rpad :: Int -> String -> String
rpad len string = take -- ..
preprocess :: [String] -> [String]
preprocess lines = map -- ..
dissect :: String -> [Int]
compress :: [String] -> [[String]]
inflate :: [String] -> [[String]]Adding those type signatures alone, makes the function more
readable, because now I can expect what it is going to do.
Use hlint on the code and heed warnings from ghc
Many warnings and suggestions are worthwile. They can point out
possible mistakes or partial functions, but also show what an
idiomatic Haskell solution could be. For instance:
Use
head instead of (!!0)readfile (args !! 0) is reminiscent of other languages, where thecommand line arguments are in an array, which you'd need to index.
Using the 0th index though, is normally done by using the
headfunction.
readFile $ head args is the suggested alternative.elem is more readable as `elem
Functions that take more than one argument, can be inserted between
the first and second argument (infix) if you write backquotes
around them. For many functions, such as elem, mod and div,
this makes much sense. So if elem "inflate" args can be
rewritten as if "inflate" elem args.
Redundant parentheses
In that same line, (inflate) and (compress) have parentheses
around them, but they are really not necessary. Removing them
makes the line more readable (even for LISP programmers).
The same goes for map (length) lines in preprocess, map (shrinkcol) cols in compress and (map length) c in inflate.
And even string++(repeat ' ') can do without the parentheses.
Shadowed bindings
Three functions use the name lines for a parameter, which happens
to be the name of a function, too. It is best to heed this
warning, for two reasons:
- If you happen to want to split some string later on in those
functions, you'll have to change the name anyway.
- People who see
lines somewhere, will probably think of the
function first, be confused, then remember that it is a
redefined value.
Using ls is not that bad, by the way, especially if you change
the type to be more documentational (see below).
Eta reduction
This mathematical term comes down to "remove the variable name", so
that the focus lies more on what the function does, and less of
what it functions on. The function shrinkcol has a c on both
ends, and they can both be removed, resulting in:
shrinkcol = map (\el -> reverse $ dropWhile (==' ') $ reverse $ dropWhile (==' ') el)
Which expresses exactly the same, just in less "words".
Avoid lambda
This warning, on the same function as above, suggests about the
same thing, but now for the \el variable. However, because we
use the $ for application of values, this line should be
rewritten using . for composition. As f . g means: do f
after g, this comes mostly down to just replacing the $ with
.:
shrinkcol = map (reverse . dropWhile (==' ') . reverse . dropWhile (==' '))
Another composition opportunity
The last warning that ghc and hlint give us, is to use composition
in the mapM_ call. Instead of saying: "map putStrLn over the
result of mapping concat over the list erg, and combine that
into one IO action", you can say "map (putStrLn after concat)
over the list erg, and combine that into one IO` action". Whichis shorter in English already :)
mapM_ (putStrLn . concat) ergAnd now we are warning-free!
Types
Use the type system
What, again? Yes. But this time, use it
Code Snippets
main :: IO ()
main = do -- ..
rpad :: Int -> String -> String
rpad len string = take -- ..
preprocess :: [String] -> [String]
preprocess lines = map -- ..
dissect :: String -> [Int]
compress :: [String] -> [[String]]
inflate :: [String] -> [[String]]shrinkcol = map (\el -> reverse $ dropWhile (==' ') $ reverse $ dropWhile (==' ') el)shrinkcol = map (reverse . dropWhile (==' ') . reverse . dropWhile (==' '))mapM_ (putStrLn . concat) ergtype Line = String
type Cell = String
preprocess :: [Line] -> [Line]
compress :: [Line] -> [[Cell]]
inflate :: [Line] -> [[Cell]]Context
StackExchange Code Review Q#115327, answer score: 4
Revisions (0)
No revisions yet.