HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

Convert data back-and-forth between CSV files and pretty-print

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
prettyconvertcsvbackbetweenfilesprintforthanddata

Problem

I wrote a script to convert CSV files in human readable format and vice versa.

Human readable format is achieved like so:

$ ./transform table.csv inflate
cola colb      colc
1    a         g
2    b         g
3    c         g
4    d         g
5    eeeeeeeee g


And conversion back to CSV is done with:

$ ./transformtable table.t
cola;colb;colc;
1;a;g;
2;b;g;
3;c;g;
4;d;g;
5;eeeeeeeee;g;


Code:

import Data.List
import Data.List.Split
import System.Environment
main = do
  args length (a++b)) names spaces
          where names = wordsBy (==' ') line
                spaces= wordsBy (/=' ') (line++" ") --add " " to ensure space after last col-name

compress lines = add_del $ transpose $ map (shrinkcol) cols
            where delim       = ';'
                  head_lens   = dissect (head lines)
                  cols        = transpose $ map (splitPlaces head_lens) lines
                  shrinkcol c = map (\el -> reverse $ dropWhile (==' ') $ reverse $ dropWhile (==' ') el) c
                  add_del     = map (map (++[delim]))

inflate lines = transpose $ map padcol $ table
          where table = transpose $ map (splitOn ";") lines --list of cols
                padcol c = map (rpad m) c --pad whole col
                   where m = (1+) $ maximum $ (map length) c --determine longest string in col, pad one more for prettiness


I'm relatively new to haskell, so I'm wondering if this can be done more elegantly.

Solution

Names

Follow the Haskell convention of names

The Haskell convention is camelCase. That is, start names with a
lowercase letter, and join words together by capitalizing them. So
head_lens would be headLens, or better yet, headerLengths.

Use names with meanings, not just labels

Come to think of it, an even better name would be columnWidths,
as that is not just what the value is, but what it means (and
what it is used for). lp and erg don't tell me a lot,
either. For preprocess I would suggest a name like rightPadLines. That way, I won't be so surprised to find out what it does :)

Don't be afraid of longer names

It is also useful and nice for other readers not to use
abbreviations, unless the scope is very small. So the a and b
in dissect line are good and clear (very short scope -> very
short names). add_del, on the other hand, would be more readable
as addDelimiters. delim is a somewhat more conventional
abbreviation.

Types

Use the type system

Yes, Haskell can find out what types your values and functions
have. And it will be correct in many, many cases. But if you
are wrong, Haskell cannot tell you, unless you tell the compiler
what you intend to do. So always, always write the type signatures
of at least the top-level bindings. That way, if your function is
not yet finished, or you make a mistake, the compiler will be able
to tell you almost exactly where.

So:

main :: IO ()
main = do -- ..

rpad :: Int -> String -> String
rpad len string = take -- ..

preprocess :: [String] -> [String]
preprocess lines = map -- ..

dissect :: String -> [Int]

compress :: [String] -> [[String]]

inflate :: [String] -> [[String]]


Adding those type signatures alone, makes the function more
readable, because now I can expect what it is going to do.

Use hlint on the code and heed warnings from ghc

Many warnings and suggestions are worthwile. They can point out
possible mistakes or partial functions, but also show what an
idiomatic Haskell solution could be. For instance:

Use head instead of (!!0)

readfile (args !! 0) is reminiscent of other languages, where the
command line arguments are in an array, which you'd need to index.
Using the 0th index though, is normally done by using the head
function. readFile $ head args is the suggested alternative.

elem is more readable as `elem

Functions that take more than one argument, can be inserted between
the first and second argument (infix) if you write backquotes
around them. For many functions, such as
elem, mod and div,
this makes much sense. So
if elem "inflate" args can be
rewritten as
if "inflate" elem args.

Redundant parentheses

In that same line,
(inflate) and (compress) have parentheses
around them, but they are really not necessary. Removing them
makes the line more readable (even for LISP programmers).

The same goes for
map (length) lines in preprocess, map (shrinkcol) cols in compress and (map length) c in inflate.
And even
string++(repeat ' ') can do without the parentheses.

Shadowed bindings

Three functions use the name
lines for a parameter, which happens
to be the name of a function, too. It is best to heed this
warning, for two reasons:

  • If you happen to want to split some string later on in those


functions, you'll have to change the name anyway.

  • People who see lines somewhere, will probably think of the


function first, be confused, then remember that it is a
redefined value.

Using
ls is not that bad, by the way, especially if you change
the type to be more documentational (see below).

Eta reduction

This mathematical term comes down to "remove the variable name", so
that the focus lies more on what the function does, and less of
what it functions on. The function
shrinkcol has a c on both
ends, and they can both be removed, resulting in:

shrinkcol = map (\el -> reverse $ dropWhile (==' ') $ reverse $ dropWhile (==' ') el)


Which expresses exactly the same, just in less "words".

Avoid lambda

This warning, on the same function as above, suggests about the
same thing, but now for the
\el variable. However, because we
use the
$ for application of values, this line should be
rewritten using
. for composition. As f . g means: do f
after
g, this comes mostly down to just replacing the $ with
.:

shrinkcol = map (reverse . dropWhile (==' ') . reverse . dropWhile (==' '))


Another composition opportunity

The last warning that ghc and hlint give us, is to use composition
in the
mapM_ call. Instead of saying: "map putStrLn over the
result of mapping
concat over the list erg, and combine that
into one
IO action", you can say "map (putStrLn after concat)
over the list
erg, and combine that into one IO` action". Which
is shorter in English already :)

mapM_ (putStrLn . concat) erg


And now we are warning-free!

Types

Use the type system

What, again? Yes. But this time, use it

Code Snippets

main :: IO ()
main = do -- ..

rpad :: Int -> String -> String
rpad len string = take -- ..

preprocess :: [String] -> [String]
preprocess lines = map -- ..

dissect :: String -> [Int]

compress :: [String] -> [[String]]

inflate :: [String] -> [[String]]
shrinkcol = map (\el -> reverse $ dropWhile (==' ') $ reverse $ dropWhile (==' ') el)
shrinkcol = map (reverse . dropWhile (==' ') . reverse . dropWhile (==' '))
mapM_ (putStrLn . concat) erg
type Line = String
type Cell = String

preprocess :: [Line] -> [Line]

compress :: [Line] -> [[Cell]]

inflate :: [Line] -> [[Cell]]

Context

StackExchange Code Review Q#115327, answer score: 4

Revisions (0)

No revisions yet.