HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

XML Schema for an XML representation of CSV

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
csvxmlforrepresentationschema

Problem

This is intended to be part of a generalised solution to the problem of converting any (with some minor restrictions) CSV content into XML. The restrictions on the CSV, and the purpose of the schema should be apparent from the annotations.

The main review criteria I request are:

  • Is it suitable for non-destructive round-trip transformations from .csv to .xml and back again to .csv?



  • Is the schema clear and readable enough?



  • Is there a simpler way to do the same thing?



  • Are there any obvious errors?



This schema, as well as associated XSLT style-sheets, when polished, will be put to good use in the public domain with a creative commons license.

Here is the schema to be reviewed:

```





This schema describes an XML representation of a subset of csv content.
The format described by this schema, here-after referred to as "xcsv"
is part of a generalised solution to the problem of converting
general csv files into suitable XML, and the reverse transform.

The restrictions on the csv content are:
* The csv file is encoded either in UTF-8 or UTF16. If UTF-16, a BOM
is required.
* The cell values of the csv may not contain the CR or LF characters.
Essentially, we are restricted to single-line values.

The xcsv format was developed by Sean B. Durkin…
www.seanbdurkin.id.au








A row element represents a "row" or "line" in the csv file. Rows contain values.



apple,"banana","red, white and blue","quote this("")"

apple
banana
red, white and blue
quote this(")






Empty rows are not possible in csv. We must have at least one value or one error.



A value element represents a decoded (model) csv "value" or "cell".
If the encoded value in the lexical csv was of a quoted form, then
the

Solution

(This is more of a comment, than an answer, but there are several longer points I'd like to address which is easier in an answer).

Could you show some use cases for this? Considering that both CSV and XML are both formats for general data storage, I don't see point in converting as CSV file into a "non-specifc" XML format instead directly into the "specific" XML format of the application in use.

Also, the problem with CSV is that it's not really standardized. Despite the name they don't need to use commas as value separators. Semicolons or tabs are common variants. Also some variants require quoting all values, or allow single quotes, or use backslashes to escape quotes in values, or allow line breaks in values (which is the one variant you curiously disallow). If you really need "non-destructive round-trip transformations" you should consider all these variants and store the "features" of the CSV implementation in your XML.

On the other hand, you store the information if a value is quoted or not, but this isn't really part of the "relevant information". Take a, for example, similar "conversation": XML -> DOM -> XML. Here it is also not stored if or how a value is quoted. An XML document such as

 ]]>


after reading it into a DOM structure and then re-serializing it, it could (and often would) come out as:

 <&> 


because both encodings are equivalent.

Similarly in your case, it shouldn't matter if a value was originally quoted or not. So if a row such as

"apple","banana","red, white and blue","quote this("")"


come out as

apple,banana,"red, white and blue","quote this("")"


should be irrelevant - unless the specific CSV application requires quoting. So it's more important to store that information in the XML, than whether or a single value was quoted or not.

Code Snippets

<example><![CDATA[ <&> ]]></example>
<example> &lt;&amp;&gt; </example>
"apple","banana","red, white and blue","quote this("")"
apple,banana,"red, white and blue","quote this("")"

Context

StackExchange Code Review Q#10180, answer score: 4

Revisions (0)

No revisions yet.