HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpModerate

Serializing objects to delimited files

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
delimitedfilesobjectsserializing

Problem

For a new project I'm going to need to be able to serialize random types to TSV or CSV files, so I write a class which can be used to serialize any object to a TSV, CSV or any other _SV file you can think of. (You could literally serialize objects to files with the letter "B" or the word "Rawr" as the column or row delimiter.)

It's pretty simple, it starts with a DelimitedColumnAttribute.

/// 
/// Represents a column which can be used in a .
/// 
[AttributeUsage(AttributeTargets.Property)]
public class DelimitedColumnAttribute : Attribute
{
    /// 
    /// The name of the column.
    /// 
    public string Name { get; set; }

    /// 
    /// The order the column should appear in.
    /// 
    public int Order { get; set; }
}


Then there's a serializer:

```
///
/// Represents a serializer that will serialize arbitrary objects to files with specific row and column separators.
///
public class DelimitedSerializer
{
///
/// The string to be used to separate columns.
///
public string ColumnDelimiter { get; set; }

///
/// The string to be used to separate rows.
///
public string RowDelimiter { get; set; }

///
/// Serializes an object to a delimited file. Throws an exception if any of the property names, column names, or values contain either the or the .
///
/// The type of the object to serialize.
/// A list of the items to serialize.
/// The serialized string.
public string Serialize(List items)
{
if (string.IsNullOrEmpty(ColumnDelimiter))
{
throw new ArgumentException($"The property '{nameof(ColumnDelimiter)}' cannot be null or an empty string.");
}

if (string.IsNullOrEmpty(RowDelimiter))
{
throw new ArgumentException($"The property '{nameof(RowDelimiter)}' cannot be null or an empty string.");
}

var result = new ExtendedStringBuilder();

var properties = typeof(T).GetProperties()

Solution

-
readonly will not make members of your static serializers readonly. While you cannot reassign another serializer to replace it, its members can still be modified. Since you have access to C# you can use a get-only property to return a new instance :

public static DelimitedSerializer TsvSerializer => new DelimitedSerializer { ColumnDelimiter = "\t", RowDelimiter = Environment.NewLine };


-
properties can be optimized as such :

var properties = typeof(T).GetProperties()
    .Select((PropertyInfo p) => new
    {
        // caching the result, so you don't have to look it up repeatly
        Attribute = p.GetCustomAttribute(),
        Info = p,
    })
    .Where(x => x.Attribute != null)
    // ?. is not needed here, but it makes testing easier with anonymous class
    .OrderBy(x => x.Attribute?.Order)
    .OrderBy(x => x.Attribute?.Name)
    .OrderBy(x => x.Info.Name)
    // properties are used multiple times, so you want to avoid deferred execution here
    .ToList();


-
properties is never materialized. LINQ use deferred execution, meaning that the query is never done ahead of time, but only when being iterated. This means that everytime you loops throught properties via foreach, the above query is execute. Once for header, and once for every single row. So, materialize it with ToList().

-
What happens if a column is null? NullReferenceException!

// NullReferenceException
var value = property.GetValue(item).ToString();
var value = property.Info.GetValue(item).ToString(); // (changed in previous bullet)

// if the property is null, value will be null as well
var value = property.Info.GetValue(item)?.ToString();

// this also need to be fixed
if (value?.Contains(ColumnDelimiter) == true)


-
The argument-guards seems a little repetitive, we can put them into a function :

Action checkForInvalidCharacters = (name, value) =>
{
    if (value == null) return;

    if (value.Contains(ColumnDelimiter))
    {
        throw new ArgumentException($"The {name} string '{value}' contains an invalid character: '{ColumnDelimiter}'.");
    }
    if (value.Contains(RowDelimiter))
    {
        throw new ArgumentException($"The {name} string '{value}' contains an invalid character: '{RowDelimiter}'.");
    }
};


So, we can use it like :

foreach (var property in properties)
{
    var name = property.Attribute?.Name ?? property.Info.Name;
    checkForInvalidCharacters("column name", name);

    // ...
}

foreach (var item in items)
{
    var row = new ExtendedStringBuilder();

    foreach (var property in properties)
    {
        var value = property.Info.GetValue(item)?.ToString();
        checkForInvalidCharacters("property value", value);

        // ...
    }

    //...
}


-
Using row.Length > 0 to determine adding a column delimiter is wrong. If the first few properties are null, you will have trouble deserializing it later, as the column will be left shift by them. Take this example :

// Yeah... I modified the function a bit to make testing easier...
/*  //.Where(x => x.Attribute != null)
    .OrderBy(x => x.Attribute?.Order)
    .OrderBy(x => x.Attribute?.Name) */
DelimitedSerializer.CsvSerializer
    .Serialize(new[]
    {
        new { A = "QQ", B = "qwe", C = 1 },
        new { A = (string)null, B = (string)null, C = 2 },
        new { A = "asd", B = "cc", C = 3 }
    })


Expected output :

A,B,C
QQ,qwe,1
,,2
asd,cc,3


Actual output :

A,B,C
QQ,qwe,1
2
asd,cc,3


You can use a small trick here, knowing that (string)null + (string)null = string.Empty:

string row = null;

foreach (var property in properties)
{
    var value = property.Info.GetValue(item)?.ToString();
    checkForInvalidCharacters("property value", value);

    if (row != null)
        row += ColumnDelimiter;

    row += value;
}


Or, you can use string.Join:

result += string.Join(ColumnDelimiter, properties
    .Select(x =>
    {
        var name = x.Attribute?.Name ?? x.Info.Name;
        checkForInvalidCharacters("column name", name);

        return name;
    }));


Full code :

```
///
/// Represents a serializer that will serialize arbitrary objects to files with specific row and column separators.
///
public class DelimitedSerializer
{
///
/// The string to be used to separate columns.
///
public string ColumnDelimiter { get; set; }

///
/// The string to be used to separate rows.
///
public string RowDelimiter { get; set; }

///
/// Serializes an object to a delimited file. Throws an exception if any of the property names, column names, or values contain either the or the .
///
/// The type of the object to serialize.
/// A list of the items to serialize.
/// The serialized string.
public string Serialize(List items)
{
if (string.IsNullOrEmpty(ColumnDelimiter))
{
throw new ArgumentException($"The property '{nameof(ColumnDelim

Code Snippets

public static DelimitedSerializer TsvSerializer => new DelimitedSerializer { ColumnDelimiter = "\t", RowDelimiter = Environment.NewLine };
var properties = typeof(T).GetProperties()
    .Select((PropertyInfo p) => new
    {
        // caching the result, so you don't have to look it up repeatly
        Attribute = p.GetCustomAttribute<DelimitedColumnAttribute>(),
        Info = p,
    })
    .Where(x => x.Attribute != null)
    // ?. is not needed here, but it makes testing easier with anonymous class
    .OrderBy(x => x.Attribute?.Order)
    .OrderBy(x => x.Attribute?.Name)
    .OrderBy(x => x.Info.Name)
    // properties are used multiple times, so you want to avoid deferred execution here
    .ToList();
// NullReferenceException
var value = property.GetValue(item).ToString();
var value = property.Info.GetValue(item).ToString(); // (changed in previous bullet)

// if the property is null, value will be null as well
var value = property.Info.GetValue(item)?.ToString();

// this also need to be fixed
if (value?.Contains(ColumnDelimiter) == true)
Action<string, string> checkForInvalidCharacters = (name, value) =>
{
    if (value == null) return;

    if (value.Contains(ColumnDelimiter))
    {
        throw new ArgumentException($"The {name} string '{value}' contains an invalid character: '{ColumnDelimiter}'.");
    }
    if (value.Contains(RowDelimiter))
    {
        throw new ArgumentException($"The {name} string '{value}' contains an invalid character: '{RowDelimiter}'.");
    }
};
foreach (var property in properties)
{
    var name = property.Attribute?.Name ?? property.Info.Name;
    checkForInvalidCharacters("column name", name);

    // ...
}

foreach (var item in items)
{
    var row = new ExtendedStringBuilder();

    foreach (var property in properties)
    {
        var value = property.Info.GetValue(item)?.ToString();
        checkForInvalidCharacters("property value", value);

        // ...
    }

    //...
}

Context

StackExchange Code Review Q#128539, answer score: 10

Revisions (0)

No revisions yet.