patterncsharpMinor

Delimited File Reader

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

codereview csharp stackoverflow c#parsing strings

filereaderdelimited

Problem

UPDATE: I have refactored the code into a Gist using @Dmitry's answer as a guide. The update is much simpler to grok, implements IDisposable, and is roughly thirty lines shorter.

I wrote this over the weekend for fun and am looking for critique. Style and readability comments are welcome but what I truly need to know is:

Does it function as advertised?

Are there any lingering bugs that I've missed?

Can you come up with a way to make it faster?

When I ask these of myself I get 1 = yes, 2 = no, and 3 = maaaaaybe. I'd like to add other features like skipping the header row, inferring data types, validating field counts, etc. but I'll be tackling that kind of thing via derivation or extension since such logic will be simpler to implement if based on an existing IEnumerable> like this one.

FLAME ON;

Usage:

foreach (var row in DelimitedReader.Create(fileName)) {
    foreach (var field in row) {
        // do stuff
    }
}

Features:

Accurate: RFC4180 Compliant

Efficient: memory usage is (roughly) equal to the size of the largest row

Fast: average throughput of ~25 megabytes per second

Flexible: the default encoding and separator/escape characters can be user-defined

Lightweight: single 160 line class with no external dependencies

Code:

```
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Text;

namespace ByteTerrace
{
public class DelimitedReader : IEnumerable>
{
private const int DEFAULT_CHUNK_SIZE = 128;
private const char DEFAULT_ESCAPE_CHAR = '"';
private const char DEFAULT_SEPARATOR_CHAR = ',';

private readonly char[] m_buffer;
private readonly Encoding m_encoding;
private readonly char m_escapeChar;
private readonly string m_fileName;
private readonly char m_separatorChar;

public char[] Buffer {
get {
return m_buffer;
}
}
public Enc

Solution

I'd prefer to rely on the builtin functionality as much as possible. I want to believe that use of the builtin stuff makes my code more readable and probably faster.

So my proposal is:

public class DelimitedReader : IEnumerable, IDisposable
{
    private readonly StreamReader reader;

    public DelimitedReader(string fileName, Encoding encoding = null)
        : this(new StreamReader(new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite),
            encoding ?? Encoding.UTF8, encoding == null))
    {
    }

    public DelimitedReader(StreamReader reader)
    {
        this.reader = reader;
    }

    public void Dispose()
    {
        reader.Dispose();
    }

    public char EscapeChar { get; set; } = '"';

    public char SeparatorChar { get; set; } = ',';

    private string[] ParseLine(string line)
    {
        List fields = new List();

        char[] charsToSeek = { EscapeChar, SeparatorChar };
        bool isEscaped = false;
        int prevPos = 0;

        while (prevPos  GetEnumerator()
    {
        while (!reader.EndOfStream)
        {
            yield return ParseLine(reader.ReadLine());
        }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

In the class above I use the StreamReader.ReadLine method to read a file line by line, and the String.IndexOf/String.IndexOfAny methods to move within the line.

According to my test runs, this approach is a bit faster.

Code Snippets

public class DelimitedReader : IEnumerable<string[]>, IDisposable
{
    private readonly StreamReader reader;

    public DelimitedReader(string fileName, Encoding encoding = null)
        : this(new StreamReader(new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite),
            encoding ?? Encoding.UTF8, encoding == null))
    {
    }

    public DelimitedReader(StreamReader reader)
    {
        this.reader = reader;
    }

    public void Dispose()
    {
        reader.Dispose();
    }


    public char EscapeChar { get; set; } = '"';

    public char SeparatorChar { get; set; } = ',';


    private string[] ParseLine(string line)
    {
        List<string> fields = new List<string>();

        char[] charsToSeek = { EscapeChar, SeparatorChar };
        bool isEscaped = false;
        int prevPos = 0;

        while (prevPos < line.Length)
        {
            // If in the escaped mode, seek for the escape char only.
            // Otherwise, seek for the both chars.
            int nextPos = isEscaped
                ? line.IndexOf(EscapeChar, prevPos)
                : line.IndexOfAny(charsToSeek, prevPos);

            if (nextPos == -1)
            {
                // We reached the end of the line
                if (!isEscaped)
                {
                    // Add the rest of the line
                    fields.Add(line.Substring(prevPos, line.Length - prevPos).Trim());
                    break;
                }
                // If there is no closing escape char
                throw new InvalidDataException("The following line has invalid format: " + line);
            }

            char nextChar = line[nextPos];
            if (nextChar == EscapeChar)
            {
                // The next char is the escape char
                if (isEscaped)
                {
                    // If already in the escaped mode
                    fields.Add(line.Substring(prevPos, nextPos - prevPos)); // No Trim
                }
                isEscaped = !isEscaped; // Toggle mode
            }
            else
            {
                // The next char is the delimiter
                fields.Add(line.Substring(prevPos, nextPos - prevPos).Trim());  // Trim
            }

            prevPos = nextPos + 1;
        }

        return fields.ToArray();
    }

    public IEnumerator<string[]> GetEnumerator()
    {
        while (!reader.EndOfStream)
        {
            yield return ParseLine(reader.ReadLine());
        }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

Context

StackExchange Code Review Q#145860, answer score: 2

Revisions (0)

No revisions yet.