patterncppMinor
Correctly import CSV data, even when possibly malformed
Viewed 0 times
possiblycsvdatawhenmalformedevencorrectlyimport
Problem
I've created a CSV parser that tries to build a string table out of a CSV file. The goal is to handle CSV files as well as Excel.
Input CSV file:
What Excel shows:
My code:
`#include
#include
// Returns a pointer to the start of the next field, or zero if this is the
// last field in the CSV p is the start position of the field sep is the
// separator used, i.e. comma or semicolon newline says whether the field ends
// with a newline or with a comma
const wchar_t nextCsvField(const wchar_t p, wchar_t sep, bool *newline, const wchar_t **escapedEnd)
{
*escapedEnd = 0;
*newline = false;
// Parse quoted sequences
if ('"' == p[0]) {
p++;
while (1) {
// Find next double-quote
p = wcschr(p, L'"');
// Check for "", it is an escaped double-quote
if (p[1] != '"') {
*escapedEnd = p;
break;
}
// If we don't find it or it's the last symbol
// then this is the last field
if (!p || !p[1])
return 0;
// Skip the escaped double-quote
p += 2;
}
}
// Find next newline or comma.
wchar_t newline_or_sep[4] = L"\n\r ";
newline_or_sep[2] = sep;
p = wcspbrk(p, newline_or_sep);
// If no newline or separator, this is the last field.
if (!p)
return 0;
// Check if we had newline.
*newline = (p[0] == '\r' || p[0] == '\n');
// Handle "\r\n", otherwise just increment
if (p[0] == '\r' && p[1] == '\n')
p += 2;
else
p++;
return p;
}
typedef std::vector > StringTable;
// Parses the CSV data and constructs a StringTable
Input CSV file:
First field of first row,"This field is multiline
but that's OK because it's enclosed in double qoutes, and this
is an escaped "" double qoute" but this one "" is not
"This is second field of second row, but it is not multiline
because it doesn't start
with an immediate double quote"
What Excel shows:
My code:
`#include
#include
// Returns a pointer to the start of the next field, or zero if this is the
// last field in the CSV p is the start position of the field sep is the
// separator used, i.e. comma or semicolon newline says whether the field ends
// with a newline or with a comma
const wchar_t nextCsvField(const wchar_t p, wchar_t sep, bool *newline, const wchar_t **escapedEnd)
{
*escapedEnd = 0;
*newline = false;
// Parse quoted sequences
if ('"' == p[0]) {
p++;
while (1) {
// Find next double-quote
p = wcschr(p, L'"');
// Check for "", it is an escaped double-quote
if (p[1] != '"') {
*escapedEnd = p;
break;
}
// If we don't find it or it's the last symbol
// then this is the last field
if (!p || !p[1])
return 0;
// Skip the escaped double-quote
p += 2;
}
}
// Find next newline or comma.
wchar_t newline_or_sep[4] = L"\n\r ";
newline_or_sep[2] = sep;
p = wcspbrk(p, newline_or_sep);
// If no newline or separator, this is the last field.
if (!p)
return 0;
// Check if we had newline.
*newline = (p[0] == '\r' || p[0] == '\n');
// Handle "\r\n", otherwise just increment
if (p[0] == '\r' && p[1] == '\n')
p += 2;
else
p++;
return p;
}
typedef std::vector > StringTable;
// Parses the CSV data and constructs a StringTable
Solution
-
I'm not sure it's best to return
-
You're using a "Yoda condition" here:
This isn't too common, and it may still be prone to error. Either way, you should have your compiler warnings up high so that any accidental assignments in conditions will be reported.
-
You should still use curly braces for single-line statements, as it could benefit maintenance.
I'm not sure it's best to return
0 from nextCsvField(). Since the function is to return a pointer, consider returning NULL (or nullptr if you have C++11). This function also shouldn't return a const pointer if the return value (a valid wchar_t) will be modified.-
You're using a "Yoda condition" here:
if ('"' == p[0])This isn't too common, and it may still be prone to error. Either way, you should have your compiler warnings up high so that any accidental assignments in conditions will be reported.
-
You should still use curly braces for single-line statements, as it could benefit maintenance.
Code Snippets
if ('"' == p[0])Context
StackExchange Code Review Q#24196, answer score: 3
Revisions (0)
No revisions yet.