HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Improve my copying of a CSV file

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
improvecsvfilecopying

Problem

I need to merge some large csv files but I feel that the way I am doing it is not optmized. It gets a number of csvfiles and creates one

public static void MergeFiles(IEnumerable csvFileNames, string outputPath)
{
    var sb = new StringBuilder();
    bool hasHeader = false;
    foreach (string csvFileName in csvFileNames)
    {
        TextReader tr = new StreamReader(csvFileName);
        string headers = tr.ReadLine();
        if (!hasHeader)
        {
            sb.AppendLine(headers);
            hasHeader = true;
        }
        sb.AppendLine(tr.ReadToEnd());  
    }
    File.WriteAllText(outputPath, sb.ToString());
}


Any suggestion or code snippets?

Solution

-
StreamReader is IDisposable and therefor the use of it should be wrapped in a using block like this:

using (TextReader tr = new StreamReader(csvFileName))
{
    ...
}


-
You are reading all files into memory and then writing it out. While this probably has a fairly good performance as it's first performing a bunch of (probably) sequential reads of all the files and then one big block write you could consider writing it out as you read it:

using (var writer = new StreamWriter(outputPath))
{
    foreach (var csvFileName in csvFileNames)
    {
        using (var reader = new StreamReader(csvFileName))
        {
            string headers = tr.ReadLine();

            if (!hasHeader)
            {
                writer.WriteLine(headers);
                hasHeader = true;
            }

            writer.Write(reader.ReadToEnd());
        }
    }
}


If the files are very big you even might want to consider reading the input line by line and writing it to the output line by line. This will be less performant but also use much less memory. In the end it's a trade off between speed and memory consumption (as it often is in software development).

The other advantage with the write as you go method is that you open the output file first - so if there is a problem with writing to the target you will know before you do any work at all.

Code Snippets

using (TextReader tr = new StreamReader(csvFileName))
{
    ...
}
using (var writer = new StreamWriter(outputPath))
{
    foreach (var csvFileName in csvFileNames)
    {
        using (var reader = new StreamReader(csvFileName))
        {
            string headers = tr.ReadLine();

            if (!hasHeader)
            {
                writer.WriteLine(headers);
                hasHeader = true;
            }

            writer.Write(reader.ReadToEnd());
        }
    }
}

Context

StackExchange Code Review Q#36273, answer score: 5

Revisions (0)

No revisions yet.