snippetjavascriptMinor
Parse a CSV file and return an object or array
Viewed 0 times
filereturnarraycsvparseandobject
Problem
I recently published a JavaScript module for NPM which is also available on Github by the name of rawiki-parse-csv. The code takes raw CSV data and returns an array. The method has one option to include the first line as a header in which case an object is returned within the array. If the option is not set each line will be returned as an array.
The project includes a Chai test which inputs some sample CSV data and checks the returned output with both options specified. The tests pass without issue and use in an actual application appears to be faultless.
Could someone comment on any issues with this code, potentials for failure, incorrect structure or areas for improvement?
```
/*
The MIT License (MIT)
Copyright (c) 2016 Rawikitua Isherwood
Parse module creates an object or set of arrays from a csv file by splitting the text at each new
line and splitting each line at comma marks.*/
var parse = function (input, option){
try
{
// output object
var data = {},
// output no columns array
container = [],
// output array
records = [],
// splits csv data at each new line
lines =input.split(/\r\n|\r|\n/),
// creates columns by splitting first line of csv
columns = lines[0].split(',');
// creates objects from csv data if column option included
if (option === true){
// loop through each line of csv file
for (var l = 1; l <= lines.length-1; l++)
{
// splits each line of csv by comma
var words = lines[l].split(',');
// builds object based on column headers
for (var cell in columns)
{
data[columns[cell]] = words[cell];
}
// pushes object to output array
records.push(data);
// resets object in order to add another
data = {};
}
The project includes a Chai test which inputs some sample CSV data and checks the returned output with both options specified. The tests pass without issue and use in an actual application appears to be faultless.
Could someone comment on any issues with this code, potentials for failure, incorrect structure or areas for improvement?
```
/*
The MIT License (MIT)
Copyright (c) 2016 Rawikitua Isherwood
Parse module creates an object or set of arrays from a csv file by splitting the text at each new
line and splitting each line at comma marks.*/
var parse = function (input, option){
try
{
// output object
var data = {},
// output no columns array
container = [],
// output array
records = [],
// splits csv data at each new line
lines =input.split(/\r\n|\r|\n/),
// creates columns by splitting first line of csv
columns = lines[0].split(',');
// creates objects from csv data if column option included
if (option === true){
// loop through each line of csv file
for (var l = 1; l <= lines.length-1; l++)
{
// splits each line of csv by comma
var words = lines[l].split(',');
// builds object based on column headers
for (var cell in columns)
{
data[columns[cell]] = words[cell];
}
// pushes object to output array
records.push(data);
// resets object in order to add another
data = {};
}
Solution
Concept
CSV is a somewhat loosely defined concept. The closest thing there is to a specification is RFC 4180.
This parser is rather naïve: it doesn't handle quoting, and as a result it cannot return data that contain strings with literal commas. Maybe that is good enough for your own use, but in my opinion it's not good enough for a publicly published library, at least not without a giant disclaimer.
For an example of a good CSV library that handles multiple dialects, take a look at Python's built-in library.
Implementation
The purpose of the
The code crashes on empty input, instead of returning an empty data structure as I would expect.
Why do you catch an exception and convert it to a return value? That defeats the purpose of the exception mechanism, and forces the caller to handle the possibility of very weird "data" resulting from the parsing.
I'm not a fan of
Why do you "reset" your temporary variables at the end of the loop for reuse in the next iteration? Why not just create it at the top of the loop?
Commenting every line of code is annoying. Most of your comments can just be eliminated.
Use semicolons consistently. You missed one at
CSV is a somewhat loosely defined concept. The closest thing there is to a specification is RFC 4180.
This parser is rather naïve: it doesn't handle quoting, and as a result it cannot return data that contain strings with literal commas. Maybe that is good enough for your own use, but in my opinion it's not good enough for a publicly published library, at least not without a giant disclaimer.
For an example of a good CSV library that handles multiple dialects, take a look at Python's built-in library.
Implementation
The purpose of the
option is not self-evident. A better name might be headerRow. Also, the way you have nearly duplicated the two cases, you might as well write it as two functions instead.The code crashes on empty input, instead of returning an empty data structure as I would expect.
Why do you catch an exception and convert it to a return value? That defeats the purpose of the exception mechanism, and forces the caller to handle the possibility of very weird "data" resulting from the parsing.
I'm not a fan of
var declarations that span multiple lines, since it is easy to unintentionally write something different if you screw up the punctuation. ("use strict" would be a good practice for helping to detect such errors.) Your use is particularly bad because the intervening comment lines and the lack of additional indentation on the subsequent lines make it hard to read correctly.l <= lines.length - 1 is not idiomatic: the standard way to write a counting for-loop in JavaScript is:for (var i = 0; i < array.length; i++)Why do you "reset" your temporary variables at the end of the loop for reuse in the next iteration? Why not just create it at the top of the loop?
Commenting every line of code is annoying. Most of your comments can just be eliminated.
Use semicolons consistently. You missed one at
return records.Code Snippets
for (var i = 0; i < array.length; i++)Context
StackExchange Code Review Q#131154, answer score: 5
Revisions (0)
No revisions yet.